英文:
What does the implementation of keras.losses.sparse_categorical_crossentropy look like?
问题
我发现tf.keras.losses.sparse_categorical_crossentropy是一个很棒的类,可以帮助我为具有大量输出类别的神经网络创建损失函数。没有它,训练模型几乎不可能,因为我发现tf.keras.losses.categorical_crossentropy会因将索引转换为非常大的1热向量而导致内存溢出错误。
然而,我对sparse_categorical_crossentropy如何避免大内存问题有一些困惑。我查看了来自TF的代码,但确实不容易了解其内部实现。
所以,有人可以提供一些实现的高层次想法吗?实现是什么样的?
谢谢!
英文:
I found tf.keras.losses.sparse_categorical_crossentropy is an amazing class that helps me create a loss function for a neural network that has a large number of output classes. Without this it is impossible to train the model, as I found tf.keras.losses.categorical_crossentropy gave an out-of-memory error because of converting an index into a 1-hot vector of very large size.
I, however, have a problem of understanding how sparse_categorical_crossentropy avoids the big memory issue. I took a look at the code from TF but it is indeed not easy to know what goes under the hood.
So, could anyone give some high-level idea of implementing this? What does the implementation look like?
Thank you!
答案1
得分: 2
它并不做任何特殊的事情,只是在需要时在一批数据的损失中生成单热编码的标签(不是同时处理所有数据),然后丢弃结果。因此,这只是内存和计算之间的经典权衡。
英文:
It does not do anything special, it just produces the one-hot encoded labels inside the loss for a batch of data (not all data at the same time), when it is needed, and then discards the results. So its just a classic trade-off between memory and computation.
答案2
得分: 1
分类交叉熵的公式如下:
其中 y_true
是真实数据,y_pred
是你的模型预测值。
y_true
和 y_pred
的维度越大,执行所有这些操作所需的内存就越多。
但请注意这个公式中的一个有趣技巧:y_true
中只有一个神经元的值为1,其他都是零!这意味着我们可以假设求和中只有一个项是非零的。
稀疏公式的作用是:
- 避免需要一个庞大的矩阵来表示
y_true
,而是仅使用索引而不是独热编码。 - 从
y_pred
中仅选择与索引对应的列,而不是为整个张量执行计算。
因此,在这里稀疏公式的主要思想是:
-
从
y_pred
中收集具有y_true
中的索引的列。 -
仅计算术语
-ln(y_pred_selected_columns)
。
英文:
The formula for categorical crossentropy is the following:
Where y_true
is the ground truth data and y_pred
is your model's predictions.
The bigger the dimensions of y_true
and y_pred
, more memory is necessary to perform all these operations.
But notice an interesting trick in this formula: only one of the neurons in y_true
is 1, all the rest are zeros!!! This means we can assume that only one term in the sum is non-zero.
What a sparse formula does is:
- Avoid the need to have a huge matrix for
y_true
, using only indices instead of one-hot encoding - Pick from
y_pred
only the column respective to the index, instead of performing calculations for the entire tensor.
So, the main idea of a sparse formula here is:
-
Gather columns from
y_pred
with the indices iny_true
. -
Calculate only the term
-ln(y_pred_selected_columns)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论