英文:
RandomForestRegressor for classification problems
问题
I've been doing the Applied Machine Learning in Python course on Coursera. On the assignment for week 4, I found something interesting. During my first attempt to complete the assignment, I tried using the RandomForestClassifier from sklearn to predict labels. However, the model was overfitting and showing poor test accuracy results. As an experiment, I switched to RandomForestRegressor. Surprisingly, not only did it not overfit, but the test accuracy was also much higher. So, why does RandomForestRegressor perform much better on a binary classification problem?
英文:
I've been doing Applied Machine Learing in Python course on coursera and on Assignment of week 4 I`ve found something interesting. During my first attempt to complete the assignment I tried using RandomForestClassifier from sklearn to predict labels, but the model was overfitting and was showing poor test accuracy results. As an experiment I switched to RandomForestRegressor and, guess what, not only did it not overfit, but test accurary was also a lot higher. So, why does RandomForestRegressor perform a lot better on a binary classification problem?
答案1
得分: 2
随机森林回归器在集成决策树时与随机森林分类器略有不同:
- 分类器使用决策树预测类别的众数
- 回归器使用决策树预测值的平均值
由于这种差异,模型的结果可能会不同。在某些情况下,这可能导致回归器的性能优于分类器。
此外,我要说的是,如果您正确调整超参数,分类问题上的分类器应该表现得更好。
英文:
The Random Forest regressor does differ somewhat from the Random Forest classifier when it comes to ensembling the decision trees:
- The classifier uses the mode of the predicted classes of the decision trees
- The regressor uses the mean of the predicted values of the decision trees
Due to this difference the models can have different results. And in some cases this might result in the regressor performing better than the classifier.
In addition to that I would say that if you tune your hyperparameters correctly, the classifier should perform better on a classification problem than the regressor.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论