2023年7月11日 00:06:41go评论84阅读模式

英文:

How to get the vector embedding of a token in GPT?

问题

我有一个GPT模型

model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device)

当我将我的批次发送到它时，我可以获得logits和隐藏状态：

out = model(batch["input_ids"].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())

logits的形状为

torch.Size([2, 1024, 42386])

对应于(批次大小, 序列长度, 词汇表大小)

如何获得最后一层（即全连接层后）第一个（即dim=0）标记的向量嵌入？我认为它应该是大小为[2, 1024, 1024]

从这里看来，它似乎应该在last_hidden_state下，但我似乎无法生成它。 out.hidden_states似乎是长度为25的元组，每个元素的维度为[2, 1024, 1024]。我想知道最后一个是否是我要找的，但我不确定。

英文:

I have a GPT model

model = BioGptForCausalLM.from_pretrained(&quot;microsoft/biogpt&quot;).to(device)

When I send my batch to it I can get the logits and the hidden states:

out = model(batch[&quot;input_ids&quot;].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())
&gt;&gt;&gt; odict_keys([&#39;logits&#39;, &#39;past_key_values&#39;, &#39;hidden_states&#39;])

The logits have shape of

torch.Size([2, 1024, 42386])

Corresponding to (batch, seq_length, vocab_length)

How can I get the vector embedding of the first (i.e., dim=0) token in the last layer (i.e., after the fully connected layer)? I believe it should be of size [2, 1024, 1024]

From here it seems like it should be under last_hidden_state, but I can't seem to generate it. out.hidden_states seems to be a tuple of length 25, where each is of dimension [2, 1024, 1024]. I'm wondering if the last one is the one I'm looking for, but I'm not sure.

答案1

得分: 2

你说得对，使用output_hidden_state=True并观察out.hidden_states是正确的。正如你所提到的，这个元素是一个长度为25的元组。根据BioGPT论文和HuggingFace文档，你的模型包含24个Transformer层，而元组中的25个元素分别是第一个嵌入层的输出以及每个层的输出。

每个张量的形状都是[B, L, E]，其中B是批处理大小，L是输入的长度，E是嵌入的维度。根据你所指示的形状，看起来你将输入填充到了1024。因此，第一个标记的表示（在第一个批次的句子中）将是out.hidden_states[k][0,0,:]，其形状为[1024]。在这里，k表示你想要使用的层，取决于你打算用它做什么。

英文:

You are right with output_hidden_state=True and watching out.hidden_states. This element is a tuple of length 25 as you mentioned. According to BioGPT paper and HuggingFace doc, your model contains 24 transformer layers, and the 25 elements in the tuple are the first embedding layer output and the outputs of each of the 24 layers.

The shape of each of these tensors is [B, L, E] where B is your batch size, L is the length of the input and E is the dimension of your embedding. It seems that you are padding your input to 1024 regarding the shape you indicated. So, the representation of your first token (in the first batched sentence) would be out.hidden_states[k][0,0,:], which is of shape [1024]. Here, k denotes the layer you want to use and it is up to you to decide which one you want depending on what you will do with it.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取GPT中标记的向量嵌入？

问题

答案1

如何在它们之间似乎没有相关性时找到一个变量对另一个变量的影响？

从PyTorch张量中删除行（使用pytorch中的drop方法）

如何在加载预训练的转换模型时跳过权重初始化？

如何使用Go SDK将`–gpus all`选项传递给Docker？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。