问题

在TensorFlow Keras的Multi-Head Attention实现中，与首先评估分子不同，他们首先评估Q/√dₖ，并添加了以下注释：

> 注意：在einsum的较小端应用标量乘法可以提高XLA性能，但可能会在Transformer注意头中引入轻微的数值差异。

这样做为何更快？在einsum之后进行除法不会同样快吗？

英文:

In the TensorFlow Keras implementation of Multi-Head Attention, instead of evaluating the numerator first like in

they evaluate Q/√dₖ first and put comment
> Note: Applying scalar multiply at the smaller end of einsum improves
> XLA performance, but may introduce slight numeric differences in
> the Transformer attention head.

How is it faster this way? Wouldn't the division after einsum be equally as fast?

答案1

得分: 1

以下是翻译好的部分：

"这个评论的建议是key中的元素数量少于以下方程中的query或attention_scores中的元素数量。

attention_scores = tf.einsum(self._dot_product_equation, key, query)

给定维度：

query：形状为`(B, T, N, key_dim)`的投影查询张量。
key：形状为`(B, S, N, key_dim)`的投影关键字张量。

假设_dot_product_equation只是执行批次矩阵乘法，如果Q是T x N，而K是S x N，则乘积Q @ K.T是T x S，如果S > N，预计左侧的乘法数量将较小。

但无论如何，除非S > T * N（或XLA存在错误），否则这不应该是主要部分。"

英文:

What the comment suggest is that the the number of elements in key is less than the number of elements in query or attention_scores in the following equation.

attention_scores = tf.einsum(self._dot_product_equation, key, query)

Given the dimensions

            query: Projected query `Tensor` of shape `(B, T, N, key_dim)`.
            key: Projected key `Tensor` of shape `(B, S, N, key_dim)`.

Assuming that _dot_product_equation is simply doing the batched matrix multiplication, if Q is T x N, and Q is S x N, the product Q @ K.T is T x S, if S > N the number of multiplications is expected to be smaller on the left.

But either way that should not be the dominant part except if S > T * N (or XLA has a bug).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在 Einsum 之前进行标量乘法更快？

问题

答案1

8点算法用于计算具有超过8个匹配点的基础矩阵。

确定一个100×100矩阵的左特征向量和特征值

LAPACK在使用带状矩阵求解器时产生不同的输出

如何告诉sympy一个变量是已知的？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论