我选择在本地pycharm上部署jupter服务器,并跑起baseline。
在pycharm安装jupter
插件:
pip install jupter
改进点
改进提示词工程
问题和解决方法
问题1:读文件的编码格式问题
解决办法:修改函数,用utf-8格式读数据。
def main(ifn, ofn):
if os.path.exists(ofn):
pass
data = []
# 按行读取数据
with open(ifn, encoding="utf-8") as reader:
for line in reader:
sample = json.loads(line)
data.append(sample)
datas = data
# print(data)
# 均匀地分成多个数据集
return_list = process_datas(datas, MODEL_NAME)
print(len(return_list))
print("All tasks finished!")
return return_list
with open('upload.jsonl', 'w',encoding="utf-8") as writer:
for sample in sorted_data:
writer.write(json.dumps(sample, ensure_ascii=False))
writer.write('\n')
参考资料:
参与讨论
(Participate in the discussion)
参与讨论