英文:
Does openai GPT finetuning consider the prompt in the loss function?
问题
OpenAI的API包括一个微调服务,将任务分为“提示”和“完成”。
文档中指出,准确性指标是根据完成部分计算的。但对于损失,文档中说它是在“训练批次”上计算的。
我的理解是,GPT模型的第一次训练总是在可用的最大批次中进行的,使用特殊令牌来分隔上下文,但始终要求预测所有条目的下一个令牌。因此,在这里,损失函数是所有输出的明显交叉熵。但在微调中,有机会学习预测“模板提示”或不学习。这两种决策都是有道理的;学习模板相当于训练解析,遮盖模板可以避免过拟合。
那么,在OpenAI的当前实践中,是如何处理的呢?
英文:
OpenAI api includes a finetuning service that divides the task in "prompt" and "completion"
https://platform.openai.com/docs/guides/fine-tuning
The documentation says that the accuracy metrics are calculated respect to the completion. But for the loss it is said that it is calculated "on the training batch".
My understanding is that the first training of a GPT model always happen in batches of max available size, using an special token to separate contexts but always asking to predict the next token for all the entries. So here the loss function is the obvious cross entropy over all the outputs. But in fine tuning, there is the opportunity to learn to predict the "template prompt" or not. Both decisions can be sensible; learning the template amounts to train a parsing, masking the template can avoid overfitting.
So, what is the current practice in OpenAI?
答案1
得分: 2
Open AI API有一个参数prompt_loss_weight
,其默认值为0.01,与始终具有1.0权重的完成相比。所以是的,它将提示的预测视为损失函数的一部分。
这种用法似乎与使用其他工具进行微调的教程不同,例如Huggingface transformers库,它允许使用掩码来丢弃输出的一部分,但不考虑损失的不同权重。
英文:
Open AI API has a parameter prompt_loss_weight
whose default is 0.01, as compared to the completion which always has a weight of 1.0. So yes, it considers the prediction of the prompt as part of the loss function.
This usage seems different to fine-tuning tutorials with other tools as Huggingface transformers library, that allow for a mask to discard part of the output but does not consider different weight of the losses.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论