英文:
RFE Termination Using RMSE with AutoFSelector
问题
模仿caret执行RFE并选择产生最低RMSE的特征,建议使用存档。我正在使用AutoFSelector和嵌套重采样,以下是代码:
ARMSS<-read.csv("Index ARMSS Proteomics Final.csv", row.names=1)
set.seed(123, "L'Ecuyer")
task = as_task_regr(ARMSS, target = "Index.ARMSS")
learner = lrn("regr.ranger", importance = "impurity")
set_threads(learner, n = 8)
resampling_inner = rsmp("cv", folds = 7)
measure = msr("regr.rmse")
terminator = trm("none")
at = AutoFSelector$new(
learner = learner,
resampling = resampling_inner,
measure = measure,
terminator = terminator,
fselect = fs("rfe", n_features = 1, feature_fraction = 0.5, recursive = FALSE),
store_models = TRUE)
resampling_outer = rsmp("repeated_cv", folds = 10, repeats = 10)
rr = resample(task, at, resampling_outer, store_models = TRUE)
我应该使用extract_inner_fselect_archives()
命令来识别每次迭代中具有最小RMSE的特征,并记录选择的特征,然后再运行上述代码,改变n_features
参数吗?如何解决在特征数量和/或所选特征方面在迭代之间的差异?
英文:
To mimic how caret performs RFE and select features that produce the lowest RMSE, it was suggested to use the archive.
I am using AutoFSelector and nested resampling with the following code:
ARMSS<-read.csv("Index ARMSS Proteomics Final.csv", row.names=1)
set.seed(123, "L'Ecuyer")
task = as_task_regr(ARMSS, target = "Index.ARMSS")
learner = lrn("regr.ranger", importance = "impurity")
set_threads(learner, n = 8)
resampling_inner = rsmp("cv", folds = 7)
measure = msr("regr.rmse")
terminator = trm("none")
at = AutoFSelector$new(
learner = learner,
resampling = resampling_inner,
measure = measure,
terminator = terminator,
fselect = fs("rfe", n_features = 1, feature_fraction = 0.5, recursive = FALSE),
store_models = TRUE)
resampling_outer = rsmp("repeated_cv", folds = 10, repeats = 10)
rr = resample(task, at, resampling_outer, store_models = TRUE)
Should I use the extract_inner_fselect_archives() command to identify each iteration with the smallest RMSE and the features that were selected then rereun the code above with the n_features argument changed? How do I reconcile differences across iterations in the number of features and/or the features selected?
答案1
得分: 2
嵌套重采样是一种统计程序,用于估计在完整数据集上训练的模型的预测性能,它不是选择最佳超参数的程序。嵌套重采样生成许多不应用于构建最终模型的超参数配置。
有关详细信息,请参阅mlr3book 第4章 - 优化。
对于特征选择也是如此。您不使用嵌套重采样选择特征集。您估计最终模型的性能。
建议使用存档。
如果没有嵌套重采样,您只需调用instance$result
或at$fselect_result
以获取具有最低均方根误差的特征子集。
英文:
> Nested resampling is a statistical procedure to estimate the predictive performance of the model trained on the full dataset, it is not a procedure to select optimal hyperparameters. Nested resampling produces many hyperparameter configurations which should not be used to construct a final model.
mlr3book Chapter 4 - Optimization.
The same is true for feature selection. You don't select a feature set with nested resampling. You estimate the performance of the final model.
> it was suggested to use the archive
Without nested resampling, you just call instance$result
or at$fselect_result
to get the feature subset with the lowest rmse.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论