我选择在本地pycharm上部署jupter服务器,并跑起baseline。
在pycharm安装jupter插件:
pip install jupter改进点
改进提示词工程
问题和解决方法
问题1:读文件的编码格式问题


解决办法:修改函数,用utf-8格式读数据。
def main(ifn, ofn):
    if os.path.exists(ofn):
        pass
    data = []
    # 按行读取数据
    with open(ifn, encoding="utf-8") as reader:
        for line in reader:
            sample = json.loads(line)
            data.append(sample)
    datas = data
    # print(data)
    # 均匀地分成多个数据集
    return_list = process_datas(datas, MODEL_NAME)
    print(len(return_list))
    print("All tasks finished!")
    return return_listwith open('upload.jsonl', 'w',encoding="utf-8") as writer:
    for sample in sorted_data:
        writer.write(json.dumps(sample, ensure_ascii=False))
        writer.write('\n')参考资料:
