英文:
sklearn LabelBinarizer choose which label (string) is the positive
问题
从我的测试中,使用sklearn 0.24.1,似乎LabelBinarizer会对要进行二进制编码的数据中的字符串进行排序,并选择最后一个值作为"正"值。例如:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['e', 'd', 'd', 'e']))
在第一个案例中,'d' 被选择为 '1',在第二个案例中 'e' 被选择为 '1'。我尚未找到一个参数或方法来指定哪个字符串(例如,在第一个案例中是 'a',或在第二个案例中是 'd')应该是 '1'。我甚至尝试了ChatGPT,但它显示了奇怪的结果。
无论如何,以下方法可以解决这个问题:
print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) - 1))
希望这对你有所帮助。
英文:
From my tests using sklearn 0.24.1 it seems that LabelBinarizer orders the strings in the data to be binarized and chooses the last value as "positive". For instance:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['e', 'd', 'd', 'e']))
In the first case 'd' is chosen to be '1' and in the second one 'e' is chosen to be '1'. I haven't been able to find a parameter or a way to specify with string (e.g. 'a' in the first case or 'd' in the second) should be the '1'. I have even tried ChatGPT, which showed up weird things.
In any case, the following did the trick:
print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) -1))
答案1
得分: 1
如果你正在处理固定的类别,可以使用 label_binarize()
代替:
print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes=['e', 'd']))
值得注意的是,如果你将 argmax()
应用于它,还可以实现自定义顺序的标签编码。
英文:
If you're dealing with fixed classes, you can use label_binarize()
instead:
print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes = ['e', 'd']))
Worth noting this also allows for custom order label encoding if you apply argmax()
to it.
答案2
得分: 0
类标签的顺序存储在 LabelBinarizer.classes_
属性中。
在执行 fit(X)
后,如果它与您的要求不符,完全可以安全地重新排序它的元素。
from sklearn import preprocessing
import numpy
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))
# 查看学到的顺序
print(lb.classes_)
# 反转它
lb.classes_ = numpy.asarray(['d', 'a'])
print(lb.transform(['a', 'd', 'd', 'a']))
英文:
The order of class labels is stored in LabelBinarizer.classes_
attribute.
After fit(X)
, it should be totally safe to re-order its elements if it doesn't agree with your requirements.
from sklearn import preprocessing
import numpy
lb = preprocessing.LabelBinarizer()
print(lb.fit_transform(['a', 'd', 'd', 'a']))
# Take a look at the learned order
print(lb.classes_)
# Invert it
lb.classes_ = numpy.asarray(["d", "a"])
print(lb.transform(['a', 'd', 'd', 'a']))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论