2023年7月10日 14:34:52go评论130阅读模式

英文:

Using smoothed labels from 0 to 1 to train a XGB classifier

问题

1st code example actually takes the smoothed labels into account during training, not just internally converting the real values to 0 or 1. In the provided code, you are using the XGBoost native interface (xgb.train) to train the model, and you have defined your custom labels as float values between 0 and 1 in the train_label and test_label arrays. XGBoost's native interface allows you to specify custom labels, so it takes these smoothed labels into account during the training process.
The XGBClassifier method from scikit-learn (sklearn wrapper) may not work with smoothed labels because it expects binary labels (0 or 1) for binary classification tasks. The error you encountered indicates that it inferred non-binary classes from the unique values in train_label. To make it work with smoothed labels, you would need to preprocess your labels to convert them into binary labels before using the XGBClassifier. Typically, you would set a threshold (e.g., 0.5) and consider values above the threshold as class 1 and values below as class 0. However, using the native XGBoost interface, as shown in your first code example, is a more direct way to handle smoothed labels without this extra step.

英文:

I want to train a XGB classifier using smoothed labels between 0 and 1 instead of binary labels.

The native XGB model seems to be able to accept smoothed labels for a binary classifier.

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
dtrain = xgb.DMatrix(train_data, label=train_label)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
dtest = xgb.DMatrix(test_data, label=test_label)
param = {&#39;max_depth&#39;: 2, &#39;eta&#39;: 1, &#39;objective&#39;: &#39;binary:logistic&#39;, &#39;eval_metric&#39;: &#39;auc&#39;}
evallist = [(dtrain, &#39;train&#39;), (dtest, &#39;eval&#39;)]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
[0]	train-auc:0.68952	eval-auc:0.53327
[1]	train-auc:0.74847	eval-auc:0.49597
[2]	train-auc:0.79158	eval-auc:0.45795
...

However, when I tried to use the sklearn wrapper XGBClassifier, I got the following error.


model = XGBClassifier(**param)
model.fit(train_data, train_label)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_12603/1675654556.py in &lt;cell line: 1&gt;()
----&gt; 1 model.fit(train_data, train_label)
~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    618             for k, arg in zip(sig.parameters, args):
    619                 kwargs[k] = arg
--&gt; 620             return func(**kwargs)
    621 
    622         return inner_f
~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
   1464                 or not (self.classes_ == expected_classes).all()
   1465             ):
-&gt; 1466                 raise ValueError(
   1467                     f&quot;Invalid classes inferred from unique values of `y`.  &quot;
   1468                     f&quot;Expected: {expected_classes}, got {self.classes_}&quot;
ValueError: Invalid classes inferred from unique values...

I have 2 questions here:

Does the 1st code example actually take the smoothed labels into
account during training or it just internally converts the real
values to 0 or 1?
Why doesn't the XGBClassifier method work with smoothed labels?
Is it possible to get it work?

答案1

得分: 1

答案 1: 在第一个代码示例中，train_label 和 test_label 是随机生成的，产生一个在 0 到 1 之间的值。因此，在代码中没有进行平滑处理。XGB 内部使用 sigmoid 函数将这些标签解释为 0 和 1。

答案 2: XGBClassifier 不适用于平滑处理后的标签，因为它期望用于分类任务的二进制标签。

要将平滑处理后的标签转换为二进制标签，您可以考虑使用 threshold 值进行预处理。

平滑处理到二进制

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
train_label_binary = np.where(train_label >= 0.5, 1, 0)  # 应用阈值将平滑标签转换为二进制标签
dtrain = xgb.DMatrix(train_data, label=train_label_binary)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
test_label_binary = np.where(test_label >= 0.5, 1, 0)  # 应用阈值将平滑标签转换为二进制标签
dtest = xgb.DMatrix(test_data, label=test_label_binary)
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)

输出:

[0]	train-auc:0.80500	eval-auc:0.51000
[1]	train-auc:0.93500	eval-auc:0.61500
[2]	train-auc:0.95000	eval-auc:0.67500
[3]	train-auc:1.00000	eval-auc:0.58000
[4]	train-auc:1.00000	eval-auc:0.57500
[5]	train-auc:1.00000	eval-auc:0.57500
[6]	train-auc:1.00000	eval-auc:0.57500
[7]	train-auc:1.00000	eval-auc:0.61500
[8]	train-auc:1.00000	eval-auc:0.60000
[9]	train-auc:1.00000	eval-auc:0.62000

英文:

Answer 1 : In the first code example, train_label and test_label are randomly generated, producing a value between 0 and 1. Hence not smoothened withing the code. XGB internally interpret these labels as 0 and 1 using a sigmoid function.

Answer 2 : XGBClassifier doesn't work with smoothened labels as it expects binary labels for classification tasks.

To convert smoothened labels into binary labels, you can consider pre-processing the labels by using threshold value.

Smoothened to Binary

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
train_label_binary = np.where(train_label &gt;= 0.5, 1, 0)  # Apply threshold to convert smoothed labels to binary labels
dtrain = xgb.DMatrix(train_data, label=train_label_binary)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
test_label_binary = np.where(test_label &gt;= 0.5, 1, 0)  # Apply threshold to convert smoothed labels to binary labels
dtest = xgb.DMatrix(test_data, label=test_label_binary)
param = {&#39;max_depth&#39;: 2, &#39;eta&#39;: 1, &#39;objective&#39;: &#39;binary:logistic&#39;, &#39;eval_metric&#39;: &#39;auc&#39;}
evallist = [(dtrain, &#39;train&#39;), (dtest, &#39;eval&#39;)]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)

Output:

[0]	train-auc:0.80500	eval-auc:0.51000
[1]	train-auc:0.93500	eval-auc:0.61500
[2]	train-auc:0.95000	eval-auc:0.67500
[3]	train-auc:1.00000	eval-auc:0.58000
[4]	train-auc:1.00000	eval-auc:0.57500
[5]	train-auc:1.00000	eval-auc:0.57500
[6]	train-auc:1.00000	eval-auc:0.57500
[7]	train-auc:1.00000	eval-auc:0.61500
[8]	train-auc:1.00000	eval-auc:0.60000
[9]	train-auc:1.00000	eval-auc:0.62000

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用从0到1的平滑标签来训练XGB分类器。

问题

答案1

提取R中特定PCB颜色以进行缺陷分类

如何学习类似Sci-kit Learn的ML库

我如何衡量一句话与其否定含义之间的语义相似性？

自定义数据集用于 TensorFlow 2.0 中的图像。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。