2023年5月7日 17:00:55go评论98阅读模式

英文:

Why does my accuracy score drop after hyperparameter tuning in XGBoost (multiclass model)?

问题

I am trying to tune the multiclass model I've built, but every time I change hyperparameters my accuracy score drops significantly. I'm using RandomizedSearchCV and best_params_ to determine which parameters I need to change. In this specific case best_params_ recommends a learning rate of .29, while dropping the accuracy score from 0.6238 to 0.6192. The code I use to tune the parameters is below:

xgb = XGBClassifier(booster='gbtree', objective='multi:softmax', random_state=42, eval_metric="auc", 
                    num_class=num_of_classes, tree_method='gpu_hist', importance_type='gain')
xgb.fit(X_train,y_train)
params={
    "colsample_bytree":[1],
    "gamma":[0],
    "learning_rate":[0.3,0.29], 
    "max_delta_step":[0], 
    "max_depth":[6],
    "min_child_weight":[1],
    "n_jobs":[12],
    "subsample":[1]
    }
clf=RandomizedSearchCV(xgb,param_distributions=params,n_iter=1000,scoring='accuracy',cv=10,verbose=3)
clf.fit(X,Y)

And this is the code for measuring accuracy:

val = clf.predict(X_test)
lb = preprocessing.LabelBinarizer()
lb.fit(y_test)
y_test_lb = lb.transform(y_test)
val_lb = lb.transform(val)
accuracy_score(y_test_lb, val_lb)

英文:

xgb = XGBClassifier(booster=&#39;gbtree&#39;, objective=&#39;multi:softmax&#39;, random_state=42, eval_metric=&quot;auc&quot;, 
                    num_class=num_of_classes, tree_method=&#39;gpu_hist&#39;, importance_type=&#39;gain&#39;)
xgb.fit(X_train,y_train)
params={
    &quot;colsample_bytree&quot;:[1],
    &quot;gamma&quot;:[0],
    &quot;learning_rate&quot;:[0.3,0.29], 
    &quot;max_delta_step&quot;:[0], 
    &quot;max_depth&quot;:[6],
    &quot;min_child_weight&quot;:[1],
    &quot;n_jobs&quot;:[12],
    &quot;subsample&quot;:[1]
    }
clf=RandomizedSearchCV(xgb,param_distributions=params,n_iter=1000,scoring=&#39;accuracy&#39;,cv=10,verbose=3)
clf.fit(X,Y)

And this is the code for measuring accuracy:

val = clf.predict(X_test)
lb = preprocessing.LabelBinarizer()
lb.fit(y_test)
y_test_lb = lb.transform(y_test)
val_lb = lb.transform(val)
accuracy_score(y_test_lb, val_lb)

答案1

得分: 1

测试准确率的波动并不一定是问题。因为最佳超参数是基于最佳CV分数推荐的，它并不总是对应于最佳的测试集准确率。在你的例子中，差异非常小。

然而，你提供的代码似乎存在多个问题：

看起来你的测试集泄漏到了训练过程中。
你使用了1000次迭代，但超参数只有2种可能的组合。这只是一个示例吗？

英文:

The fluctuation of test accuracy is not necessarily a problem. Since the best hyperparameters are recommended based on the best CV score, it does not always correspond to the best test set accuracy. In your example, the difference is very minor.

However, there appear to be multiple issues with the code you provided:

It seems like your test set is getting leaked into the training procedure.
You are using 1000 iterations, but there are only 2 possible combinations for hyperparameters. Is this just an example?

答案2

得分: 1

将 clf.fit(X,Y) 改为 clf.fit(X_train,y_train) 以防止数据泄漏。
仔细监控训练和测试得分，并使用差异来检查是否发生了过拟合、欠拟合、数据泄漏等情况。
考虑引入第三个验证集。将数据分为训练、验证和保留集（60/20/20 是一个合理的默认值）。在训练上训练模型，通过在验证集上评估模型来调整参数。然后，一旦满意模型的性能，使用保留集进行最终测试，以获取模型在生产环境中的性能。
根据数据的大小和变化情况，cv=10 可能过高，使其不太实际。
考虑在多类分类中不使用准确率。

英文:

The score difference is because your using cv, so you should expect a different score as the different splits of your data will produce different scores. The following will help more generally:

Change clf.fit(X,Y) to clf.fit(X_train,y_train) to stope leaking data.
Carefully monitor your training and testing scores, and use the differences to check what overfitting, underfitting, leakage etc... is happening.
Think about introducing a third validation set. Split your data into training, validation and holdout (60/20/20 is a sensible default). Train your models on training, tune your parameters by evaluating the model on your validation set. Then, once you are happy with your models performance, do a final test on the holdout set to get a good representation of how your model will perform in production.
Depending on the size of data, and variation within, cv=10 could be too high making it less practical.
Consider not using accuracy for multiclass classification.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我的准确度分数在XGBoost（多分类模型）超参数调优后为什么下降？

问题

答案1

答案2

使用asyncpg插入大括号中的文本

在文件名中找到相似的时间

如何向nump数组的第一个索引添加噪音。

AttributeError: aenter Pytest AioHTTP

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。