英文:
Difference between Instruction Tuning vs Non Instruction Tuning Large Language Models
问题
指令调优与大型语言模型的普通微调之间有什么区别?
此外,我提到的指令调优不是指上下文/提示式的调优。
所有最近关于微调的论文似乎都是关于指令调优的。
我已经查看了一些关于微调/指令调优的论文(例如FLAN),但没有一个真正描述了指令调优与其他替代方法之间的区别(不管其他替代方法是什么)。
我理解指令调优是微调的一种形式,但具有指令数据集。但难道不是所有数据集都是指令数据集吗?还有什么其他类型的数据集?
英文:
What is the difference between instruction tuning and normal fine-tuning for large language models?
Also the instruction-tuning I'm referring to isn't the in-context/prompt one.
All the recent papers about fine-tuning seem to be about instruction tuning.
I have looked at a couple of papers about fine-tuning/instruction tuning (e.g. FLAN) and none really describe the difference between instruction tuning and the alternatives (whatever the alternatives are).
I understand instruction-tuning is a form of fine-tuning but with an instruction dataset. But are all datasets not instruction datasets? What other kinds are there?
答案1
得分: 7
正如你所说,微调和指导微调并不是互相排斥的,但指导微调是一种(监督式)微调的形式,因此没有将微调与指导微调区分开的特殊特征,只有相反的情况。所以对你的第一个问题的答案是“不”(我理解为“每个数据集都是指导数据集吗?”而不是“指导数据集甚至存在吗?”)。
指导微调的特殊之处在于,模型是为了一个指令遵循的任务而进行微调的,这涉及指示指令接收者执行另一个任务,即在指令中定义的第二个“级别”的任务(例如“将以下数字拆分为数字”),这些任务仅在指令中定义,这些指令是模型输入序列的一部分。
在经典类型的监督微调中,没有指令,而是直接微调模型以执行单一的下游任务,例如将输入数字拆分成数字,而不在模型输入中明确告诉它要这样做。 (然而,也有涉及微调和显式指令的混合方法。)
因此,尽管单词“任务”经常用来指代两者之一,但在概念上有必要区分:
- 模型(如果有的话)微调的任务,
- 最终用户希望模型执行的任务,以及
- 向模型提供这些任务的方式
- (还有相应的数据集和统计分布)
总之,可以说在指令遵循中,实际任务在推理时是动态确定的,而在没有指令或类似设备的经典微调方法中,实际任务在训练时是静态确定的。
你的困惑可能与提示(另一种广泛使用的自适应技术)相关,它可以涉及任务的抽象描述(例如在零样本学习中),可以被表述为指令。
但再次强调,这并非必需:少样本提示不一定涉及任务的抽象描述,而提示可能仅包含任务的输入-输出示例,以及模型应该创建输出的输入。
回答你的第二个问题:你可以在Hugging Face Hub上找到许多数据集/基准。如果你随机点击其中的一些,你将在预览中看到大多数数据集都不包含任何指令。
英文:
As you said, fine-tuning and instruction tuning are not mutually exclusive, but instruction tuning is a form of (supervised) fine-tuning, so there is no distinguishing feature of fine-tuning that differentiates it from instruction tuning, but only the other way around. So the answer to your first question is "No" (I read it as "Is every dataset an instruction dataset?", not as "Do instruction datasets even exist?").
What is special about instruction tuning is that the model is fine-tuned for an instruction-following task, which involves instructing the instruction receiver to perform another task, i.e. you have a second "level" of tasks (e.g. "Split the following number into digits") that is defined only in the instructions, which are part of the model's input sequence.
In classical types of supervised fine-tuning, you have no instructions, but directly tune the model to perform a single downstream task, e.g. to split an input number into digits, without being explicitly told to do so in the model input. (However, there are also hybrid approaches that involve both fine-tuning and explicit instructions.)
So although the word "task" is often used to refer to either, it is essential to conceptually distinguish between:
- the task the model is fine-tuned to (if at all),
- the task the end-user wants the model to perform, and
- the way inputs for either of these tasks are presented to the model
- (and also the corresponding datasets and statistical distributions)
In summary, one could say that in instruction following, the actual task is determined dynamically, at inference time, while in the classical fine-tuning approach without instructions or similar devices, the actual task is determined statically, at training time.
Your confusion might be connected to the fact that prompting, which is another widespread adaptation technique, can involve an abstract description of the task (e.g. in zero-shot learning), which can be formulated as an instruction.
But again, this is not necessary: Few-shot prompting does not necessarily involve an abstract description of the task, but the prompt may consist only of input-output examples of the task, plus the input for which the model should create an output.
To answer your second question: You can find many datasets/benchmarks on the Hugging Face Hub. If you randomly click at a few of them, you will see in the preview that most of them don't contain any instructions.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论