英文:
Transformers tokenizer attention mask for pytorch
问题
In my code I have:
output = self.decoder(output, embedded, tgt_mask=attention_mask)
where
decoder_layer = TransformerDecoderLayer(embedding_size, num_heads, hidden_size, dropout, batch_first=True)
self.decoder = TransformerDecoder(decoder_layer, 1)
I generate the attention mask using a huggingface's tokenizer:
batch = tokenizer(example['text'], return_tensors="pt", truncation=True, max_length=1024, padding='max_length')
inputs = batch['input_ids']
attention_mask = batch['attention_mask']
Running it through the models fails on
AssertionError: only bool and floating types of attn_mask are supported
Changing the attention mask to attention_mask = batch['attention_mask'].bool()
Causes
RuntimeError: The shape of the 2D attn_mask is torch.Size([4, 1024]), but should be (1024, 1024)
Any idea how I can use a huggingface tokenizer with my own pytorch module?
英文:
In my code I have:
output = self.decoder(output, embedded, tgt_mask=attention_mask)
where
decoder_layer = TransformerDecoderLayer(embedding_size, num_heads, hidden_size, dropout, batch_first=True)
self.decoder = TransformerDecoder(decoder_layer, 1)
I generate the attention mask using a huggingface's tokenizer:
batch = tokenizer(example['text'], return_tensors="pt", truncation=True, max_length=1024, padding='max_length')
inputs = batch['input_ids']
attention_mask = batch['attention_mask']
Running it through the models fails on
AssertionError: only bool and floating types of attn_mask are supported
Changing the attention mask to attention_mask = batch['attention_mask']
.bool()
Causes
RuntimeError: The shape of the 2D attn_mask is torch.Size([4, 1024]), but should be (1024, 1024)
Any idea how I can use a huggingface tokenizer with my own pytorch module?
答案1
得分: 2
Pytorch的 tgt_mask
不同于hf的 attention_mask
。后者指示哪些标记是填充的:
from transformers import BertTokenizer
t = BertTokenizer.from_pretrained("bert-base-cased")
encoded = t("this is a test", max_length=10, padding="max_length")
print(t.pad_token_id)
print(encoded.input_ids)
print(encoded.attention_mask)
输出:
0
[101, 1142, 1110, 170, 2774, 102, 0, 0, 0, 0]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
Pytorch的等价部分是tgt_key_padding_mask。
另一方面,tgt_mask
有不同的用途,它定义了哪个标记应该参与到其他标记中。对于自然语言处理变换器解码器来说,通常用于防止标记参与到未来的标记中(因果掩码)。如果这是你的用例,你也可以简单地传递 tgt_is_causal=True
,PyTorch将为你创建 tgt_mask
。
英文:
Pytorchs tgt_mask
is not the same as hf attention_mask
. The latter indicates which tokens are padded:
from transformers import BertTokenizer
t= BertTokenizer.from_pretrained("bert-base-cased")
encoded = t("this is a test", max_length=10, padding="max_length")
print(t.pad_token_id)
print(encoded.input_ids)
print(encoded.attention_mask)
Output:
0
[101, 1142, 1110, 170, 2774, 102, 0, 0, 0, 0]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
Pytorchs equivalent to that is tgt_key_padding_mask.
The tgt_mask
on the other hand serves a different purpose, it defines which token should attend to other tokens. For an NLP transformer decoder, this is usually used to prevent tokens to attend to future tokens (causal mask). In case this is your use case, you could also simply pass tgt_is_causal=True
and PyTorch will create the tgt_mask
for you.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论