资讯

On June 12, the Academy unveiled a sculpture of the late Chinese agricultural scientist Yuan Longping, known as the "Father ...
答案正确性奖励 (r_ans): 最终答案是否答对,由GPT-4o进行语义评估并结合BLEU相似度给分。 这样的「老师」对自然语言表述具有强鲁棒性,避免模型钻格式空子,也进一步降低了人工评判成本。