2020年1月3日 18:51:38go评论117阅读模式

英文:

What does the implementation of keras.losses.sparse_categorical_crossentropy look like?

问题

我发现tf.keras.losses.sparse_categorical_crossentropy是一个很棒的类，可以帮助我为具有大量输出类别的神经网络创建损失函数。没有它，训练模型几乎不可能，因为我发现tf.keras.losses.categorical_crossentropy会因将索引转换为非常大的1热向量而导致内存溢出错误。

然而，我对sparse_categorical_crossentropy如何避免大内存问题有一些困惑。我查看了来自TF的代码，但确实不容易了解其内部实现。

所以，有人可以提供一些实现的高层次想法吗？实现是什么样的？
谢谢！

英文:

I found tf.keras.losses.sparse_categorical_crossentropy is an amazing class that helps me create a loss function for a neural network that has a large number of output classes. Without this it is impossible to train the model, as I found tf.keras.losses.categorical_crossentropy gave an out-of-memory error because of converting an index into a 1-hot vector of very large size.

I, however, have a problem of understanding how sparse_categorical_crossentropy avoids the big memory issue. I took a look at the code from TF but it is indeed not easy to know what goes under the hood.

So, could anyone give some high-level idea of implementing this? What does the implementation look like?
Thank you!

答案1

得分: 2

它并不做任何特殊的事情，只是在需要时在一批数据的损失中生成单热编码的标签（不是同时处理所有数据），然后丢弃结果。因此，这只是内存和计算之间的经典权衡。

英文:

It does not do anything special, it just produces the one-hot encoded labels inside the loss for a batch of data (not all data at the same time), when it is needed, and then discards the results. So its just a classic trade-off between memory and computation.

答案2

得分: 1

分类交叉熵的公式如下：

其中 y_true 是真实数据，y_pred 是你的模型预测值。

y_true 和 y_pred 的维度越大，执行所有这些操作所需的内存就越多。

但请注意这个公式中的一个有趣技巧：y_true 中只有一个神经元的值为1，其他都是零！这意味着我们可以假设求和中只有一个项是非零的。

稀疏公式的作用是：

避免需要一个庞大的矩阵来表示 y_true，而是仅使用索引而不是独热编码。
从 y_pred 中仅选择与索引对应的列，而不是为整个张量执行计算。

因此，在这里稀疏公式的主要思想是：

从 y_pred 中收集具有 y_true 中的索引的列。
仅计算术语 -ln(y_pred_selected_columns)。

英文:

The formula for categorical crossentropy is the following:

Where y_true is the ground truth data and y_pred is your model's predictions.

The bigger the dimensions of y_true and y_pred, more memory is necessary to perform all these operations.

But notice an interesting trick in this formula: only one of the neurons in y_true is 1, all the rest are zeros!!! This means we can assume that only one term in the sum is non-zero.

What a sparse formula does is:

Avoid the need to have a huge matrix for y_true, using only indices instead of one-hot encoding
Pick from y_pred only the column respective to the index, instead of performing calculations for the entire tensor.

So, the main idea of a sparse formula here is:

Gather columns from y_pred with the indices in y_true.
Calculate only the term -ln(y_pred_selected_columns)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

keras.losses.sparse_categorical_crossentropy的实现是怎样的？

问题

答案1

答案2

Django身份验证在现有应用中同时使用电子邮件和用户名

EC25 GPS自动初始化

检查链表是否为回文。

Given two arrays `a` and `b`, find all pairs `(i, j)`such that `i <= j` and `a[i] – b[j] = a[j] – b[i]`

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。