2023年3月7日 17:51:38go评论71阅读模式

英文:

sklearn LabelBinarizer choose which label (string) is the positive

问题

从我的测试中，使用sklearn 0.24.1，似乎LabelBinarizer会对要进行二进制编码的数据中的字符串进行排序，并选择最后一个值作为"正"值。例如：

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['e', 'd', 'd', 'e']))

在第一个案例中，'d' 被选择为 '1'，在第二个案例中 'e' 被选择为 '1'。我尚未找到一个参数或方法来指定哪个字符串（例如，在第一个案例中是 'a'，或在第二个案例中是 'd'）应该是 '1'。我甚至尝试了ChatGPT，但它显示了奇怪的结果。

无论如何，以下方法可以解决这个问题：

print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) - 1))

希望这对你有所帮助。

英文:

From my tests using sklearn 0.24.1 it seems that LabelBinarizer orders the strings in the data to be binarized and chooses the last value as "positive". For instance:

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform([&#39;a&#39;, &#39;d&#39;, &#39;d&#39;, &#39;a&#39;]))
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform([&#39;e&#39;, &#39;d&#39;, &#39;d&#39;, &#39;e&#39;]))

In the first case 'd' is chosen to be '1' and in the second one 'e' is chosen to be '1'. I haven't been able to find a parameter or a way to specify with string (e.g. 'a' in the first case or 'd' in the second) should be the '1'. I have even tried ChatGPT, which showed up weird things.

In any case, the following did the trick:

print(-1 * (lb.fit_transform([&#39;e&#39;, &#39;d&#39;, &#39;d&#39;, &#39;e&#39;]) -1))

答案1

得分: 1

如果你正在处理固定的类别，可以使用 label_binarize() 代替：

print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes=['e', 'd']))

值得注意的是，如果你将 argmax() 应用于它，还可以实现自定义顺序的标签编码。

英文:

If you're dealing with fixed classes, you can use label_binarize() instead:

print(preprocessing.label_binarize([&#39;e&#39;, &#39;d&#39;, &#39;d&#39;, &#39;e&#39;], classes = [&#39;e&#39;, &#39;d&#39;]))

Worth noting this also allows for custom order label encoding if you apply argmax() to it.

答案2

得分: 0

类标签的顺序存储在 LabelBinarizer.classes_ 属性中。

在执行 fit(X) 后，如果它与您的要求不符，完全可以安全地重新排序它的元素。

from sklearn import preprocessing
import numpy
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))
# 查看学到的顺序
print(lb.classes_)
# 反转它
lb.classes_ = numpy.asarray(['d', 'a'])
print(lb.transform(['a', 'd', 'd', 'a']))

英文:

The order of class labels is stored in LabelBinarizer.classes_ attribute.

After fit(X), it should be totally safe to re-order its elements if it doesn't agree with your requirements.

from sklearn import preprocessing
import numpy
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform([&#39;a&#39;, &#39;d&#39;, &#39;d&#39;, &#39;a&#39;]))
# Take a look at the learned order
print(lb.classes_)
# Invert it
lb.classes_ = numpy.asarray([&quot;d&quot;, &quot;a&quot;])
print(lb.transform([&#39;a&#39;, &#39;d&#39;, &#39;d&#39;, &#39;a&#39;]))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

sklearn LabelBinarizer 选择哪个标签（字符串）作为正类别。

问题

答案1

答案2

Sklearn随机森林：确定模型拟合和预测所确定的特征名称。

如何反向转换加载的 pickle XGBoost 模型的预测输出？

为什么我在cross_val_score()中得分比实际测试中高得多？

Lower DBCV Scores for Cluster Analysis using Sklearn’s GridSearchCV

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。