transformers(注意力就足够了)的训练数据输入是什么?

huangapple go评论55阅读模式
英文:

What is the training data input to the transformers (attention is all you need)?

问题

抱歉,我只返回翻译好的部分,不提供其他内容:

A. 作为训练数据,您是同时输入这两个句子吗,即(Le chat mange, The cat eats)

还是

B. 使用((Le chat mange, ), The), ((Le chat mange, The), cat), ((Le chat mange, The cat), eats)作为训练数据?

如果是A,听起来我需要等待网络在训练过程中逐个生成单词,这样无法并行化。所以我猜应该是B?

英文:

Sorry I'm new to NLP. Please bear with me. Say I have two sentences:

French: Le chat mange.

English: The cat eats.

In the following text, I will denote a training data as a tuple (x, y), where x is the input data and y is the annotation.

When I train a transformer network, do I A. input these two sentences synchronously as training data, i.e. (Le chat mange, The cat eats)? Or do I B. use
((Le chat mange, ), The), ((Le chat mange, The), cat), ((Le chat mange, The cat), eats) as training data?

If it's A, sounds like I have to wait for the network to produce the words one by one during training, which would not be parallelizable. So I guess it should be B?

答案1

得分: 0

我明白了。这个源句子的“移位”是通过应用论文中提到的“掩码”来完成的。

这个掩码看起来像这样

M=[0, 0, ..., 0
   1, 0, ..., 0
   1, 1, ..., 0] 

在自注意力中,由于矩阵 QK^T(忽略缩放因子)表示“查询”和“键”之间的交叉相关性,当应用掩码时:M o (QK^T)o 表示逐元素乘法),“当前查询” Q[i,:] 与“未来”键 K[i+k,:] 之间的相关性,对于 k=1,...,N-i 都被忽略。

英文:

I figured it out. This "shifting" of the source sentence is done by applying the "mask" mentioned in the paper.

The mask looks like this

M=[0, 0, ..., 0
   1, 0, ..., 0
   1, 1, ..., 0] 

In self attention, since the matrix QK^T (scaling factor ignored) represents the cross-correlation between the "queries" and the "keys", when the mask is applied: M o (QK^T) (o denotes elementwise multiplication), the correlations between the "current query" Q[i,:] and "future" keys K[i+k,:], for k=1,...,N-i are ignored.

huangapple
  • 本文由 发表于 2020年1月6日 02:20:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/59602891.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定