keras.losses.sparse_categorical_crossentropy的实现是怎样的?

huangapple go评论89阅读模式
英文:

What does the implementation of keras.losses.sparse_categorical_crossentropy look like?

问题

我发现tf.keras.losses.sparse_categorical_crossentropy是一个很棒的类,可以帮助我为具有大量输出类别的神经网络创建损失函数。没有它,训练模型几乎不可能,因为我发现tf.keras.losses.categorical_crossentropy会因将索引转换为非常大的1热向量而导致内存溢出错误。

然而,我对sparse_categorical_crossentropy如何避免大内存问题有一些困惑。我查看了来自TF的代码,但确实不容易了解其内部实现。

所以,有人可以提供一些实现的高层次想法吗?实现是什么样的?
谢谢!

英文:

I found tf.keras.losses.sparse_categorical_crossentropy is an amazing class that helps me create a loss function for a neural network that has a large number of output classes. Without this it is impossible to train the model, as I found tf.keras.losses.categorical_crossentropy gave an out-of-memory error because of converting an index into a 1-hot vector of very large size.

I, however, have a problem of understanding how sparse_categorical_crossentropy avoids the big memory issue. I took a look at the code from TF but it is indeed not easy to know what goes under the hood.

So, could anyone give some high-level idea of implementing this? What does the implementation look like?
Thank you!

答案1

得分: 2

它并不做任何特殊的事情,只是在需要时在一批数据的损失中生成单热编码的标签(不是同时处理所有数据),然后丢弃结果。因此,这只是内存和计算之间的经典权衡。

英文:

It does not do anything special, it just produces the one-hot encoded labels inside the loss for a batch of data (not all data at the same time), when it is needed, and then discards the results. So its just a classic trade-off between memory and computation.

答案2

得分: 1

分类交叉熵的公式如下:

keras.losses.sparse_categorical_crossentropy的实现是怎样的?

其中 y_true 是真实数据,y_pred 是你的模型预测值。

y_truey_pred 的维度越大,执行所有这些操作所需的内存就越多。

但请注意这个公式中的一个有趣技巧:y_true 中只有一个神经元的值为1,其他都是零!这意味着我们可以假设求和中只有一个项是非零的。

稀疏公式的作用是:

  • 避免需要一个庞大的矩阵来表示 y_true,而是仅使用索引而不是独热编码。
  • y_pred 中仅选择与索引对应的列,而不是为整个张量执行计算。

因此,在这里稀疏公式的主要思想是:

  • y_pred 中收集具有 y_true 中的索引的列。

  • 仅计算术语 -ln(y_pred_selected_columns)

英文:

The formula for categorical crossentropy is the following:

keras.losses.sparse_categorical_crossentropy的实现是怎样的?

Where y_true is the ground truth data and y_pred is your model's predictions.

The bigger the dimensions of y_true and y_pred, more memory is necessary to perform all these operations.

But notice an interesting trick in this formula: only one of the neurons in y_true is 1, all the rest are zeros!!! This means we can assume that only one term in the sum is non-zero.

What a sparse formula does is:

  • Avoid the need to have a huge matrix for y_true, using only indices instead of one-hot encoding
  • Pick from y_pred only the column respective to the index, instead of performing calculations for the entire tensor.

So, the main idea of a sparse formula here is:

  • Gather columns from y_pred with the indices in y_true.

  • Calculate only the term -ln(y_pred_selected_columns)

huangapple
  • 本文由 发表于 2020年1月3日 18:51:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/59577258.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定