2023年5月21日 17:06:45go评论79阅读模式

英文:

How to properly prompt the decoder of a Transformer model?

问题

I am using Hugging Face Transformers. I have a pretrained Encoder + Decoder model (Pegasus), and want to fine-tune it as described in this article.

Specifically, they use the following process:

In other words, they prepend a manual prompt to the generation of the model itself.

My question relates to the Decoder input. Specifically, I want to fine tune the model so that it takes the prompt (entity chain), and generates a summary from that point onwards.

For instance:

&lt;s&gt; [ENTITYCHAIN] Frozen | Disney [SUMMARY] $tok_1 $tok_2 $tok_3 ...
=========================================== ^^^^^^ ^^^^^^ ^^^^^^
This is not generated                       Generate from here

However, as you would expect, the model is generating predictions for each token in the entity chain, which I do not need. But most importantly, the loss is being computed by also factoring in the predictions related to the entity chain. This clearly undermines the purpose of training, since it confuses the model, because it should learn to only generate the summary, and not the entity chain (which is already given as a prompt).

As I was saying, what I want is to give a prompt (entity chain) to the decoder, and make it generate a summary, while being able to attend to the extra information from the prompt. Of course, the loss should only be computed among the generated tokens, excluding the prompt tokens.

By looking into the model documentation, I don't seem to find an option to do this. Any ideas?

英文:

I am using Hugging Face Transformers. I have a pretrained Encoder + Decoder model (Pegasus), and want to fine-tune it as described in this article.

Specifically, they use the following process:

In other words, they prepend a manual prompt to the generation of the model itself.

My question relates to the Decoder input. Specifically, I want to fine tune the model so that it takes the prompt (entity chain), and generates a summary from that point onwards.

For instance:

&lt;s&gt; [ENTITYCHAIN] Frozen | Disney [SUMMARY] $tok_1 $tok_2 $tok_3 ...
=========================================== ^^^^^^ ^^^^^^ ^^^^^^
This is not generated                       Generate from here

By looking into the model documentation, I don't seem to find an option to do this. Any ideas?

答案1

得分: 2

A convention that pytorch loss functions use is that if you set a label to -100 during training, the loss function will ignore the token. See the Documentation for ease of mind.

Here's a minimal code example:

# Libraries
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from copy import deepcopy
# Get the tokenizer and the model
checkpoint = 't5-small'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
# Sample text
inp = 'Here is my input'
outp = 'Here is my output'
# Get token IDs
inp_ids = tokenizer(inp, return_tensors = 'pt').input_ids
outp_ids = tokenizer(outp, return_tensors = 'pt').input_ids
# Calculate loss
loss = model(input_ids = inp_ids, labels = outp_ids).loss.item()
print(loss)
# Let's set the first token to -100 and recalculate loss
modified_outp_ids = deepcopy(outp_ids)
modified_outp_ids[0][0] = -100 # the first [0] is because we only have one sequence in our batch
model_output = model(input_ids = inp_ids, labels = modified_outp_ids)
print(model_output.loss.item())

英文:

A convention that pytorch loss functions use is that if you set a label to -100 during training, the loss function will ignore the token. See the Documentation for ease of mind.

Here's a minimal code example:

# Libraries
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from copy import deepcopy
# Get the tokenizer and the model
checkpoint = &#39;t5-small&#39;
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
# Sample text
inp = &#39;Here is my input&#39;
outp = &#39;Here is my output&#39;
# Get token IDs
inp_ids = tokenizer(inp, return_tensors = &#39;pt&#39;).input_ids
outp_ids = tokenizer(outp, return_tensors = &#39;pt&#39;).input_ids
# Calculate loss
loss = model(input_ids = inp_ids, labels = outp_ids).loss.item()
print(loss)
# Let&#39;s set the first token to -100 and recalculate loss
modified_outp_ids = deepcopy(outp_ids)
modified_outp_ids[0][0] = -100 # the first [0] is because we only have one sequence in our batch
model_output = model(input_ids = inp_ids, labels = modified_outp_ids)
print(model_output.loss.item())

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何正确提示Transformer模型的解码器？

问题

答案1

TypeError: ‘NoneType’ 对象不可调用

向量连接

如何为在[0,1]和[0,255]范围内归一化的图像添加图像保存功能？

PyTorch版本适用于CUDA 12.2。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。