如何获取GPT中标记的向量嵌入?

huangapple go评论60阅读模式
英文:

How to get the vector embedding of a token in GPT?

问题

我有一个GPT模型

model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device)

当我将我的批次发送到它时,我可以获得logits和隐藏状态:

out = model(batch["input_ids"].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())

logits的形状为

torch.Size([2, 1024, 42386])

对应于(批次大小, 序列长度, 词汇表大小)

如何获得最后一层(即全连接层后)第一个(即dim=0)标记的向量嵌入?我认为它应该是大小为[2, 1024, 1024]

这里看来,它似乎应该在last_hidden_state下,但我似乎无法生成它。 out.hidden_states似乎是长度为25的元组,每个元素的维度为[2, 1024, 1024]。我想知道最后一个是否是我要找的,但我不确定。

英文:

I have a GPT model

model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device)

When I send my batch to it I can get the logits and the hidden states:

out = model(batch["input_ids"].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())
>>> odict_keys(['logits', 'past_key_values', 'hidden_states'])

The logits have shape of

torch.Size([2, 1024, 42386])

Corresponding to (batch, seq_length, vocab_length)

How can I get the vector embedding of the first (i.e., dim=0) token in the last layer (i.e., after the fully connected layer)? I believe it should be of size [2, 1024, 1024]

From here it seems like it should be under last_hidden_state, but I can't seem to generate it. out.hidden_states seems to be a tuple of length 25, where each is of dimension [2, 1024, 1024]. I'm wondering if the last one is the one I'm looking for, but I'm not sure.

答案1

得分: 2

你说得对,使用output_hidden_state=True并观察out.hidden_states是正确的。正如你所提到的,这个元素是一个长度为25的元组。根据BioGPT论文HuggingFace文档,你的模型包含24个Transformer层,而元组中的25个元素分别是第一个嵌入层的输出以及每个层的输出。

每个张量的形状都是[B, L, E],其中B是批处理大小,L是输入的长度,E是嵌入的维度。根据你所指示的形状,看起来你将输入填充到了1024。因此,第一个标记的表示(在第一个批次的句子中)将是out.hidden_states[k][0,0,:],其形状为[1024]。在这里,k表示你想要使用的层,取决于你打算用它做什么。

英文:

You are right with output_hidden_state=True and watching out.hidden_states. This element is a tuple of length 25 as you mentioned. According to BioGPT paper and HuggingFace doc, your model contains 24 transformer layers, and the 25 elements in the tuple are the first embedding layer output and the outputs of each of the 24 layers.

The shape of each of these tensors is [B, L, E] where B is your batch size, L is the length of the input and E is the dimension of your embedding. It seems that you are padding your input to 1024 regarding the shape you indicated. So, the representation of your first token (in the first batched sentence) would be out.hidden_states[k][0,0,:], which is of shape [1024]. Here, k denotes the layer you want to use and it is up to you to decide which one you want depending on what you will do with it.

huangapple
  • 本文由 发表于 2023年7月11日 00:06:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76655508.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定