英文:
How to get the vector embedding of a token in GPT?
问题
我有一个GPT模型
model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device)
当我将我的批次发送到它时,我可以获得logits和隐藏状态:
out = model(batch["input_ids"].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())
logits的形状为
torch.Size([2, 1024, 42386])
对应于(批次大小, 序列长度, 词汇表大小)
如何获得最后一层(即全连接层后)第一个(即dim=0
)标记的向量嵌入?我认为它应该是大小为[2, 1024, 1024]
从这里看来,它似乎应该在last_hidden_state
下,但我似乎无法生成它。 out.hidden_states
似乎是长度为25
的元组,每个元素的维度为[2, 1024, 1024]
。我想知道最后一个是否是我要找的,但我不确定。
英文:
I have a GPT model
model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device)
When I send my batch to it I can get the logits and the hidden states:
out = model(batch["input_ids"].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())
>>> odict_keys(['logits', 'past_key_values', 'hidden_states'])
The logits have shape of
torch.Size([2, 1024, 42386])
Corresponding to (batch, seq_length, vocab_length)
How can I get the vector embedding of the first (i.e., dim=0
) token in the last layer (i.e., after the fully connected layer)? I believe it should be of size [2, 1024, 1024]
From here it seems like it should be under last_hidden_state
, but I can't seem to generate it. out.hidden_states
seems to be a tuple of length 25
, where each is of dimension [2, 1024, 1024]
. I'm wondering if the last one is the one I'm looking for, but I'm not sure.
答案1
得分: 2
你说得对,使用output_hidden_state=True
并观察out.hidden_states
是正确的。正如你所提到的,这个元素是一个长度为25的元组。根据BioGPT论文和HuggingFace文档,你的模型包含24个Transformer层,而元组中的25个元素分别是第一个嵌入层的输出以及每个层的输出。
每个张量的形状都是[B, L, E]
,其中B
是批处理大小,L
是输入的长度,E
是嵌入的维度。根据你所指示的形状,看起来你将输入填充到了1024。因此,第一个标记的表示(在第一个批次的句子中)将是out.hidden_states[k][0,0,:]
,其形状为[1024]
。在这里,k
表示你想要使用的层,取决于你打算用它做什么。
英文:
You are right with output_hidden_state=True
and watching out.hidden_states
. This element is a tuple of length 25 as you mentioned. According to BioGPT paper and HuggingFace doc, your model contains 24 transformer layers, and the 25 elements in the tuple are the first embedding layer output and the outputs of each of the 24 layers.
The shape of each of these tensors is [B, L, E]
where B
is your batch size, L
is the length of the input and E
is the dimension of your embedding. It seems that you are padding your input to 1024 regarding the shape you indicated. So, the representation of your first token (in the first batched sentence) would be out.hidden_states[k][0,0,:]
, which is of shape [1024]
. Here, k
denotes the layer you want to use and it is up to you to decide which one you want depending on what you will do with it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论