我选择在本地pycharm上部署jupter服务器,并跑起baseline。

在pycharm安装jupter插件:

pip install jupter

改进点

改进提示词工程

问题和解决方法

问题1:读文件的编码格式问题

解决办法:修改函数,用utf-8格式读数据。

def main(ifn, ofn):
    if os.path.exists(ofn):
        pass
    data = []
    # 按行读取数据
    with open(ifn, encoding="utf-8") as reader:
        for line in reader:
            sample = json.loads(line)
            data.append(sample)
    datas = data
    # print(data)
    # 均匀地分成多个数据集
    return_list = process_datas(datas, MODEL_NAME)
    print(len(return_list))
    print("All tasks finished!")
    return return_list
with open('upload.jsonl', 'w',encoding="utf-8") as writer:
    for sample in sorted_data:
        writer.write(json.dumps(sample, ensure_ascii=False))
        writer.write('\n')

参考资料:

数据鲸鱼 --- Datawhale (linklearner.com)