sklearn LabelBinarizer 选择哪个标签(字符串)作为正类别。

huangapple go评论47阅读模式
英文:

sklearn LabelBinarizer choose which label (string) is the positive

问题

从我的测试中,使用sklearn 0.24.1,似乎LabelBinarizer会对要进行二进制编码的数据中的字符串进行排序,并选择最后一个值作为"正"值。例如:

from sklearn import preprocessing

lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))

lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['e', 'd', 'd', 'e']))

在第一个案例中,'d' 被选择为 '1',在第二个案例中 'e' 被选择为 '1'。我尚未找到一个参数或方法来指定哪个字符串(例如,在第一个案例中是 'a',或在第二个案例中是 'd')应该是 '1'。我甚至尝试了ChatGPT,但它显示了奇怪的结果。

无论如何,以下方法可以解决这个问题:

print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) - 1))

希望这对你有所帮助。

英文:

From my tests using sklearn 0.24.1 it seems that LabelBinarizer orders the strings in the data to be binarized and chooses the last value as "positive". For instance:

from sklearn import preprocessing

lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))

lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['e', 'd', 'd', 'e']))

In the first case 'd' is chosen to be '1' and in the second one 'e' is chosen to be '1'. I haven't been able to find a parameter or a way to specify with string (e.g. 'a' in the first case or 'd' in the second) should be the '1'. I have even tried ChatGPT, which showed up weird things.

In any case, the following did the trick:

print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) -1))

答案1

得分: 1

如果你正在处理固定的类别,可以使用 label_binarize() 代替:

print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes=['e', 'd']))

值得注意的是,如果你将 argmax() 应用于它,还可以实现自定义顺序的标签编码。

英文:

If you're dealing with fixed classes, you can use label_binarize() instead:

print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes = ['e', 'd']))

Worth noting this also allows for custom order label encoding if you apply argmax() to it.

答案2

得分: 0

类标签的顺序存储在 LabelBinarizer.classes_ 属性中。

在执行 fit(X) 后,如果它与您的要求不符,完全可以安全地重新排序它的元素。

from sklearn import preprocessing

import numpy

lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))

# 查看学到的顺序
print(lb.classes_)

# 反转它
lb.classes_ = numpy.asarray(['d', 'a'])

print(lb.transform(['a', 'd', 'd', 'a']))
英文:

The order of class labels is stored in LabelBinarizer.classes_ attribute.

After fit(X), it should be totally safe to re-order its elements if it doesn't agree with your requirements.

from sklearn import preprocessing

import numpy

lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))

# Take a look at the learned order
print(lb.classes_)

# Invert it
lb.classes_ = numpy.asarray(["d", "a"])

print(lb.transform(['a', 'd', 'd', 'a']))

huangapple
  • 本文由 发表于 2023年3月7日 17:51:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660356.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定