英文:
Why does my accuracy score drop after hyperparameter tuning in XGBoost (multiclass model)?
问题
I am trying to tune the multiclass model I've built, but every time I change hyperparameters my accuracy score drops significantly. I'm using RandomizedSearchCV and best_params_ to determine which parameters I need to change. In this specific case best_params_ recommends a learning rate of .29, while dropping the accuracy score from 0.6238 to 0.6192. The code I use to tune the parameters is below:
xgb = XGBClassifier(booster='gbtree', objective='multi:softmax', random_state=42, eval_metric="auc",
num_class=num_of_classes, tree_method='gpu_hist', importance_type='gain')
xgb.fit(X_train,y_train)
params={
"colsample_bytree":[1],
"gamma":[0],
"learning_rate":[0.3,0.29],
"max_delta_step":[0],
"max_depth":[6],
"min_child_weight":[1],
"n_jobs":[12],
"subsample":[1]
}
clf=RandomizedSearchCV(xgb,param_distributions=params,n_iter=1000,scoring='accuracy',cv=10,verbose=3)
clf.fit(X,Y)
And this is the code for measuring accuracy:
val = clf.predict(X_test)
lb = preprocessing.LabelBinarizer()
lb.fit(y_test)
y_test_lb = lb.transform(y_test)
val_lb = lb.transform(val)
accuracy_score(y_test_lb, val_lb)
英文:
I am trying to tune the multiclass model I've built, but every time I change hyperparameters my accuracy score drops significantly. I'm using RandomizedSearchCV and best_params_ to determine which parameters I need to change. In this specific case best_params_ recommends a learning rate of .29, while dropping the accuracy score from 0.6238 to 0.6192. The code I use to tune the parameters is below:
xgb = XGBClassifier(booster='gbtree', objective='multi:softmax', random_state=42, eval_metric="auc",
num_class=num_of_classes, tree_method='gpu_hist', importance_type='gain')
xgb.fit(X_train,y_train)
params={
"colsample_bytree":[1],
"gamma":[0],
"learning_rate":[0.3,0.29],
"max_delta_step":[0],
"max_depth":[6],
"min_child_weight":[1],
"n_jobs":[12],
"subsample":[1]
}
clf=RandomizedSearchCV(xgb,param_distributions=params,n_iter=1000,scoring='accuracy',cv=10,verbose=3)
clf.fit(X,Y)
And this is the code for measuring accuracy:
val = clf.predict(X_test)
lb = preprocessing.LabelBinarizer()
lb.fit(y_test)
y_test_lb = lb.transform(y_test)
val_lb = lb.transform(val)
accuracy_score(y_test_lb, val_lb)
答案1
得分: 1
测试准确率的波动并不一定是问题。因为最佳超参数是基于最佳CV分数推荐的,它并不总是对应于最佳的测试集准确率。在你的例子中,差异非常小。
然而,你提供的代码似乎存在多个问题:
- 看起来你的测试集泄漏到了训练过程中。
- 你使用了1000次迭代,但超参数只有2种可能的组合。这只是一个示例吗?
英文:
The fluctuation of test accuracy is not necessarily a problem. Since the best hyperparameters are recommended based on the best CV score, it does not always correspond to the best test set accuracy. In your example, the difference is very minor.
However, there appear to be multiple issues with the code you provided:
- It seems like your test set is getting leaked into the training procedure.
- You are using 1000 iterations, but there are only 2 possible combinations for hyperparameters. Is this just an example?
答案2
得分: 1
- 将
clf.fit(X,Y)
改为clf.fit(X_train,y_train)
以防止数据泄漏。 - 仔细监控训练和测试得分,并使用差异来检查是否发生了过拟合、欠拟合、数据泄漏等情况。
- 考虑引入第三个验证集。将数据分为训练、验证和保留集(60/20/20 是一个合理的默认值)。在训练上训练模型,通过在验证集上评估模型来调整参数。然后,一旦满意模型的性能,使用保留集进行最终测试,以获取模型在生产环境中的性能。
- 根据数据的大小和变化情况,
cv=10
可能过高,使其不太实际。 - 考虑在多类分类中不使用准确率。
英文:
The score difference is because your using cv, so you should expect a different score as the different splits of your data will produce different scores. The following will help more generally:
- Change
clf.fit(X,Y)
toclf.fit(X_train,y_train)
to stope leaking data. - Carefully monitor your training and testing scores, and use the differences to check what overfitting, underfitting, leakage etc... is happening.
- Think about introducing a third validation set. Split your data into training, validation and holdout (60/20/20 is a sensible default). Train your models on training, tune your parameters by evaluating the model on your validation set. Then, once you are happy with your models performance, do a final test on the holdout set to get a good representation of how your model will perform in production.
- Depending on the size of data, and variation within,
cv=10
could be too high making it less practical. - Consider not using accuracy for multiclass classification.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论