问题

抱歉，我只返回翻译好的部分，不提供其他内容：

A. 作为训练数据，您是同时输入这两个句子吗，即(Le chat mange, The cat eats)？

还是

B. 使用((Le chat mange, ), The), ((Le chat mange, The), cat), ((Le chat mange, The cat), eats)作为训练数据？

如果是A，听起来我需要等待网络在训练过程中逐个生成单词，这样无法并行化。所以我猜应该是B？

英文:

Sorry I'm new to NLP. Please bear with me. Say I have two sentences:

French: Le chat mange.

English: The cat eats.

In the following text, I will denote a training data as a tuple (x, y), where x is the input data and y is the annotation.

When I train a transformer network, do I A. input these two sentences synchronously as training data, i.e. (Le chat mange, The cat eats)? Or do I B. use
((Le chat mange, ), The), ((Le chat mange, The), cat), ((Le chat mange, The cat), eats) as training data?

If it's A, sounds like I have to wait for the network to produce the words one by one during training, which would not be parallelizable. So I guess it should be B?

答案1

得分: 0

我明白了。这个源句子的“移位”是通过应用论文中提到的“掩码”来完成的。

这个掩码看起来像这样

M=[0, 0, ..., 0
   1, 0, ..., 0
   1, 1, ..., 0]

在自注意力中，由于矩阵 QK^T（忽略缩放因子）表示“查询”和“键”之间的交叉相关性，当应用掩码时：M o (QK^T)（o 表示逐元素乘法），“当前查询” Q[i,:] 与“未来”键 K[i+k,:] 之间的相关性，对于 k=1,...,N-i 都被忽略。

英文:

I figured it out. This "shifting" of the source sentence is done by applying the "mask" mentioned in the paper.

The mask looks like this

M=[0, 0, ..., 0
   1, 0, ..., 0
   1, 1, ..., 0]

In self attention, since the matrix QK^T (scaling factor ignored) represents the cross-correlation between the "queries" and the "keys", when the mask is applied: M o (QK^T) (o denotes elementwise multiplication), the correlations between the "current query" Q[i,:] and "future" keys K[i+k,:], for k=1,...,N-i are ignored.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

transformers（注意力就足够了）的训练数据输入是什么？

问题

答案1

在PyTorch中，是否可以通过系数来冻结一个模块？

Detection object with custom YOLOv5 model by using SAHI: AttributeError: module 'yolov5' has no attribute 'load'

我想在Python中获取掩码的边缘，该边缘将用于深度学习损失函数中的标签。

Pytorch与已训练的模型+预训练模型（Intel OpenVINO）不兼容。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论