sklearn LabelBinarizer 选择哪个标签(字符串)作为正类别。

huangapple go评论69阅读模式
英文:

sklearn LabelBinarizer choose which label (string) is the positive

问题

从我的测试中,使用sklearn 0.24.1,似乎LabelBinarizer会对要进行二进制编码的数据中的字符串进行排序,并选择最后一个值作为"正"值。例如:

  1. from sklearn import preprocessing
  2. lb = preprocessing.LabelBinarizer()
  3. print(lb.fit_transform(['a', 'd', 'd', 'a']))
  4. lb = preprocessing.LabelBinarizer()
  5. print(lb.fit_transform(['e', 'd', 'd', 'e']))

在第一个案例中,'d' 被选择为 '1',在第二个案例中 'e' 被选择为 '1'。我尚未找到一个参数或方法来指定哪个字符串(例如,在第一个案例中是 'a',或在第二个案例中是 'd')应该是 '1'。我甚至尝试了ChatGPT,但它显示了奇怪的结果。

无论如何,以下方法可以解决这个问题:

  1. print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) - 1))

希望这对你有所帮助。

英文:

From my tests using sklearn 0.24.1 it seems that LabelBinarizer orders the strings in the data to be binarized and chooses the last value as "positive". For instance:

  1. from sklearn import preprocessing
  2. lb = preprocessing.LabelBinarizer()
  3. print(lb.fit_transform(['a', 'd', 'd', 'a']))
  4. lb = preprocessing.LabelBinarizer()
  5. print(lb.fit_transform(['e', 'd', 'd', 'e']))

In the first case 'd' is chosen to be '1' and in the second one 'e' is chosen to be '1'. I haven't been able to find a parameter or a way to specify with string (e.g. 'a' in the first case or 'd' in the second) should be the '1'. I have even tried ChatGPT, which showed up weird things.

In any case, the following did the trick:

  1. print(-1 * (lb.fit_transform(['e', 'd', 'd', 'e']) -1))

答案1

得分: 1

如果你正在处理固定的类别,可以使用 label_binarize() 代替:

  1. print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes=['e', 'd']))

值得注意的是,如果你将 argmax() 应用于它,还可以实现自定义顺序的标签编码。

英文:

If you're dealing with fixed classes, you can use label_binarize() instead:

  1. print(preprocessing.label_binarize(['e', 'd', 'd', 'e'], classes = ['e', 'd']))

Worth noting this also allows for custom order label encoding if you apply argmax() to it.

答案2

得分: 0

类标签的顺序存储在 LabelBinarizer.classes_ 属性中。

在执行 fit(X) 后,如果它与您的要求不符,完全可以安全地重新排序它的元素。

  1. from sklearn import preprocessing
  2. import numpy
  3. lb = preprocessing.LabelBinarizer()
  4. print(lb.fit_transform(['a', 'd', 'd', 'a']))
  5. # 查看学到的顺序
  6. print(lb.classes_)
  7. # 反转它
  8. lb.classes_ = numpy.asarray(['d', 'a'])
  9. print(lb.transform(['a', 'd', 'd', 'a']))
英文:

The order of class labels is stored in LabelBinarizer.classes_ attribute.

After fit(X), it should be totally safe to re-order its elements if it doesn't agree with your requirements.

  1. from sklearn import preprocessing
  2. import numpy
  3. lb = preprocessing.LabelBinarizer()
  4. print(lb.fit_transform(['a', 'd', 'd', 'a']))
  5. # Take a look at the learned order
  6. print(lb.classes_)
  7. # Invert it
  8. lb.classes_ = numpy.asarray(["d", "a"])
  9. print(lb.transform(['a', 'd', 'd', 'a']))

huangapple
  • 本文由 发表于 2023年3月7日 17:51:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660356.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定