Transformers分词器的PyTorch注意力掩码

huangapple go评论91阅读模式
英文:

Transformers tokenizer attention mask for pytorch

问题

In my code I have:

output = self.decoder(output, embedded, tgt_mask=attention_mask)

where

decoder_layer = TransformerDecoderLayer(embedding_size, num_heads, hidden_size, dropout, batch_first=True)
self.decoder = TransformerDecoder(decoder_layer, 1)

I generate the attention mask using a huggingface's tokenizer:

batch = tokenizer(example['text'], return_tensors="pt", truncation=True, max_length=1024, padding='max_length')
inputs = batch['input_ids']
attention_mask = batch['attention_mask']

Running it through the models fails on

AssertionError: only bool and floating types of attn_mask are supported

Changing the attention mask to attention_mask = batch['attention_mask'].bool()

Causes

RuntimeError: The shape of the 2D attn_mask is torch.Size([4, 1024]), but should be (1024, 1024)

Any idea how I can use a huggingface tokenizer with my own pytorch module?

英文:

In my code I have:

output = self.decoder(output, embedded, tgt_mask=attention_mask)

where

decoder_layer = TransformerDecoderLayer(embedding_size, num_heads, hidden_size, dropout, batch_first=True)
self.decoder = TransformerDecoder(decoder_layer, 1)

I generate the attention mask using a huggingface's tokenizer:

batch = tokenizer(example['text'], return_tensors="pt", truncation=True, max_length=1024, padding='max_length')
inputs = batch['input_ids']
attention_mask = batch['attention_mask']

Running it through the models fails on

AssertionError: only bool and floating types of attn_mask are supported

Changing the attention mask to attention_mask = batch['attention_mask']
.bool()

Causes

RuntimeError: The shape of the 2D attn_mask is torch.Size([4, 1024]), but should be (1024, 1024)

Any idea how I can use a huggingface tokenizer with my own pytorch module?

答案1

得分: 2

Pytorch的 tgt_mask 不同于hf的 attention_mask。后者指示哪些标记是填充的:

from transformers import BertTokenizer

t = BertTokenizer.from_pretrained("bert-base-cased")

encoded = t("this is a test", max_length=10, padding="max_length")
print(t.pad_token_id)
print(encoded.input_ids)
print(encoded.attention_mask)

输出:

0
[101, 1142, 1110, 170, 2774, 102, 0, 0, 0, 0]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]

Pytorch的等价部分是tgt_key_padding_mask

另一方面,tgt_mask 有不同的用途,它定义了哪个标记应该参与到其他标记中。对于自然语言处理变换器解码器来说,通常用于防止标记参与到未来的标记中(因果掩码)。如果这是你的用例,你也可以简单地传递 tgt_is_causal=True,PyTorch将为你创建 tgt_mask

英文:

Pytorchs tgt_mask is not the same as hf attention_mask. The latter indicates which tokens are padded:

from transformers import BertTokenizer

t= BertTokenizer.from_pretrained("bert-base-cased")

encoded = t("this is a test", max_length=10, padding="max_length")
print(t.pad_token_id)
print(encoded.input_ids)
print(encoded.attention_mask)

Output:

0
[101, 1142, 1110, 170, 2774, 102, 0, 0, 0, 0]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]

Pytorchs equivalent to that is tgt_key_padding_mask.

The tgt_mask on the other hand serves a different purpose, it defines which token should attend to other tokens. For an NLP transformer decoder, this is usually used to prevent tokens to attend to future tokens (causal mask). In case this is your use case, you could also simply pass tgt_is_causal=True and PyTorch will create the tgt_mask for you.

huangapple
  • 本文由 发表于 2023年6月1日 02:44:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定