如何正确提示Transformer模型的解码器?

huangapple go评论79阅读模式
英文:

How to properly prompt the decoder of a Transformer model?

问题

I am using Hugging Face Transformers. I have a pretrained Encoder + Decoder model (Pegasus), and want to fine-tune it as described in this article.

Specifically, they use the following process:

如何正确提示Transformer模型的解码器?

In other words, they prepend a manual prompt to the generation of the model itself.

My question relates to the Decoder input. Specifically, I want to fine tune the model so that it takes the prompt (entity chain), and generates a summary from that point onwards.

For instance:

  1. <s> [ENTITYCHAIN] Frozen | Disney [SUMMARY] $tok_1 $tok_2 $tok_3 ...
  2. =========================================== ^^^^^^ ^^^^^^ ^^^^^^
  3. This is not generated Generate from here

However, as you would expect, the model is generating predictions for each token in the entity chain, which I do not need. But most importantly, the loss is being computed by also factoring in the predictions related to the entity chain. This clearly undermines the purpose of training, since it confuses the model, because it should learn to only generate the summary, and not the entity chain (which is already given as a prompt).

As I was saying, what I want is to give a prompt (entity chain) to the decoder, and make it generate a summary, while being able to attend to the extra information from the prompt. Of course, the loss should only be computed among the generated tokens, excluding the prompt tokens.

By looking into the model documentation, I don't seem to find an option to do this. Any ideas? 如何正确提示Transformer模型的解码器?

英文:

I am using Hugging Face Transformers. I have a pretrained Encoder + Decoder model (Pegasus), and want to fine-tune it as described in this article.

Specifically, they use the following process:

如何正确提示Transformer模型的解码器?

In other words, they prepend a manual prompt to the generation of the model itself.

My question relates to the Decoder input. Specifically, I want to fine tune the model so that it takes the prompt (entity chain), and generates a summary from that point onwards.

For instance:

  1. <s> [ENTITYCHAIN] Frozen | Disney [SUMMARY] $tok_1 $tok_2 $tok_3 ...
  2. =========================================== ^^^^^^ ^^^^^^ ^^^^^^
  3. This is not generated Generate from here

However, as you would expect, the model is generating predictions for each token in the entity chain, which I do not need. But most importantly, the loss is being computed by also factoring in the predictions related to the entity chain. This clearly undermines the purpose of training, since it confuses the model, because it should learn to only generate the summary, and not the entity chain (which is already given as a prompt).

As I was saying, what I want is to give a prompt (entity chain) to the decoder, and make it generate a summary, while being able to attend to the extra information from the prompt. Of course, the loss should only be computed among the generated tokens, excluding the prompt tokens.

By looking into the model documentation, I don't seem to find an option to do this. Any ideas? 如何正确提示Transformer模型的解码器?

答案1

得分: 2

A convention that pytorch loss functions use is that if you set a label to -100 during training, the loss function will ignore the token. See the Documentation for ease of mind.

Here's a minimal code example:

  1. # Libraries
  2. import transformers
  3. from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  4. from copy import deepcopy
  5. # Get the tokenizer and the model
  6. checkpoint = 't5-small'
  7. tokenizer = AutoTokenizer.from_pretrained(checkpoint)
  8. model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
  9. # Sample text
  10. inp = 'Here is my input'
  11. outp = 'Here is my output'
  12. # Get token IDs
  13. inp_ids = tokenizer(inp, return_tensors = 'pt').input_ids
  14. outp_ids = tokenizer(outp, return_tensors = 'pt').input_ids
  15. # Calculate loss
  16. loss = model(input_ids = inp_ids, labels = outp_ids).loss.item()
  17. print(loss)
  18. # Let's set the first token to -100 and recalculate loss
  19. modified_outp_ids = deepcopy(outp_ids)
  20. modified_outp_ids[0][0] = -100 # the first [0] is because we only have one sequence in our batch
  21. model_output = model(input_ids = inp_ids, labels = modified_outp_ids)
  22. print(model_output.loss.item())
英文:

A convention that pytorch loss functions use is that if you set a label to -100 during training, the loss function will ignore the token. See the Documentation for ease of mind.

Here's a minimal code example:

  1. # Libraries
  2. import transformers
  3. from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  4. from copy import deepcopy
  5. # Get the tokenizer and the model
  6. checkpoint = 't5-small'
  7. tokenizer = AutoTokenizer.from_pretrained(checkpoint)
  8. model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
  9. # Sample text
  10. inp = 'Here is my input'
  11. outp = 'Here is my output'
  12. # Get token IDs
  13. inp_ids = tokenizer(inp, return_tensors = 'pt').input_ids
  14. outp_ids = tokenizer(outp, return_tensors = 'pt').input_ids
  15. # Calculate loss
  16. loss = model(input_ids = inp_ids, labels = outp_ids).loss.item()
  17. print(loss)
  18. # Let's set the first token to -100 and recalculate loss
  19. modified_outp_ids = deepcopy(outp_ids)
  20. modified_outp_ids[0][0] = -100 # the first [0] is because we only have one sequence in our batch
  21. model_output = model(input_ids = inp_ids, labels = modified_outp_ids)
  22. print(model_output.loss.item())

huangapple
  • 本文由 发表于 2023年5月21日 17:06:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76299091.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定