英文:
Using smoothed labels from 0 to 1 to train a XGB classifier
问题
-
1st code example actually takes the smoothed labels into account during training, not just internally converting the real values to 0 or 1. In the provided code, you are using the XGBoost native interface (
xgb.train
) to train the model, and you have defined your custom labels as float values between 0 and 1 in thetrain_label
andtest_label
arrays. XGBoost's native interface allows you to specify custom labels, so it takes these smoothed labels into account during the training process. -
The XGBClassifier method from scikit-learn (sklearn wrapper) may not work with smoothed labels because it expects binary labels (0 or 1) for binary classification tasks. The error you encountered indicates that it inferred non-binary classes from the unique values in
train_label
. To make it work with smoothed labels, you would need to preprocess your labels to convert them into binary labels before using the XGBClassifier. Typically, you would set a threshold (e.g., 0.5) and consider values above the threshold as class 1 and values below as class 0. However, using the native XGBoost interface, as shown in your first code example, is a more direct way to handle smoothed labels without this extra step.
英文:
I want to train a XGB classifier using smoothed labels between 0 and 1 instead of binary labels.
The native XGB model seems to be able to accept smoothed labels for a binary classifier.
from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
dtrain = xgb.DMatrix(train_data, label=train_label)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
dtest = xgb.DMatrix(test_data, label=test_label)
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
[0] train-auc:0.68952 eval-auc:0.53327
[1] train-auc:0.74847 eval-auc:0.49597
[2] train-auc:0.79158 eval-auc:0.45795
...
However, when I tried to use the sklearn wrapper XGBClassifier, I got the following error.
model = XGBClassifier(**param)
model.fit(train_data, train_label)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12603/1675654556.py in <cell line: 1>()
----> 1 model.fit(train_data, train_label)
~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
618 for k, arg in zip(sig.parameters, args):
619 kwargs[k] = arg
--> 620 return func(**kwargs)
621
622 return inner_f
~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
1464 or not (self.classes_ == expected_classes).all()
1465 ):
-> 1466 raise ValueError(
1467 f"Invalid classes inferred from unique values of `y`. "
1468 f"Expected: {expected_classes}, got {self.classes_}"
ValueError: Invalid classes inferred from unique values...
I have 2 questions here:
- Does the 1st code example actually take the smoothed labels into
account during training or it just internally converts the real
values to 0 or 1? - Why doesn't the XGBClassifier method work with smoothed labels?
Is it possible to get it work?
答案1
得分: 1
答案 1: 在第一个代码示例中,train_label
和 test_label
是随机生成的,产生一个在 0 到 1 之间的值。因此,在代码中没有进行平滑处理。XGB 内部使用 sigmoid 函数将这些标签解释为 0 和 1。
答案 2: XGBClassifier
不适用于平滑处理后的标签,因为它期望用于分类任务的二进制标签。
要将平滑处理后的标签转换为二进制标签,您可以考虑使用 threshold
值进行预处理。
平滑处理到二进制
from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
train_label_binary = np.where(train_label >= 0.5, 1, 0) # 应用阈值将平滑标签转换为二进制标签
dtrain = xgb.DMatrix(train_data, label=train_label_binary)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
test_label_binary = np.where(test_label >= 0.5, 1, 0) # 应用阈值将平滑标签转换为二进制标签
dtest = xgb.DMatrix(test_data, label=test_label_binary)
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
输出:
[0] train-auc:0.80500 eval-auc:0.51000
[1] train-auc:0.93500 eval-auc:0.61500
[2] train-auc:0.95000 eval-auc:0.67500
[3] train-auc:1.00000 eval-auc:0.58000
[4] train-auc:1.00000 eval-auc:0.57500
[5] train-auc:1.00000 eval-auc:0.57500
[6] train-auc:1.00000 eval-auc:0.57500
[7] train-auc:1.00000 eval-auc:0.61500
[8] train-auc:1.00000 eval-auc:0.60000
[9] train-auc:1.00000 eval-auc:0.62000
英文:
Answer 1 : In the first code example, train_label
and test_label
are randomly generated, producing a value between 0 and 1. Hence not smoothened withing the code. XGB internally interpret these labels as 0 and 1 using a sigmoid function.
Answer 2 : XGBClassifier
doesn't work with smoothened labels as it expects binary labels for classification tasks.
To convert smoothened labels into binary labels, you can consider pre-processing the labels by using threshold
value.
Smoothened to Binary
from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
train_label_binary = np.where(train_label >= 0.5, 1, 0) # Apply threshold to convert smoothed labels to binary labels
dtrain = xgb.DMatrix(train_data, label=train_label_binary)
test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
test_label_binary = np.where(test_label >= 0.5, 1, 0) # Apply threshold to convert smoothed labels to binary labels
dtest = xgb.DMatrix(test_data, label=test_label_binary)
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]
bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
Output:
[0] train-auc:0.80500 eval-auc:0.51000
[1] train-auc:0.93500 eval-auc:0.61500
[2] train-auc:0.95000 eval-auc:0.67500
[3] train-auc:1.00000 eval-auc:0.58000
[4] train-auc:1.00000 eval-auc:0.57500
[5] train-auc:1.00000 eval-auc:0.57500
[6] train-auc:1.00000 eval-auc:0.57500
[7] train-auc:1.00000 eval-auc:0.61500
[8] train-auc:1.00000 eval-auc:0.60000
[9] train-auc:1.00000 eval-auc:0.62000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论