RFE Termination Using RMSE with AutoFSelector

huangapple go评论49阅读模式
英文:

RFE Termination Using RMSE with AutoFSelector

问题

模仿caret执行RFE并选择产生最低RMSE的特征,建议使用存档。我正在使用AutoFSelector和嵌套重采样,以下是代码:

ARMSS<-read.csv("Index ARMSS Proteomics Final.csv", row.names=1)

set.seed(123, "L'Ecuyer")

task = as_task_regr(ARMSS, target = "Index.ARMSS")

learner = lrn("regr.ranger", importance = "impurity")

set_threads(learner, n = 8)

resampling_inner = rsmp("cv", folds = 7)
measure = msr("regr.rmse")
terminator = trm("none")

at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs("rfe", n_features = 1, feature_fraction = 0.5, recursive = FALSE),
  store_models = TRUE)

resampling_outer = rsmp("repeated_cv", folds = 10, repeats = 10)

rr = resample(task, at, resampling_outer, store_models = TRUE)

我应该使用extract_inner_fselect_archives()命令来识别每次迭代中具有最小RMSE的特征,并记录选择的特征,然后再运行上述代码,改变n_features参数吗?如何解决在特征数量和/或所选特征方面在迭代之间的差异?

英文:

To mimic how caret performs RFE and select features that produce the lowest RMSE, it was suggested to use the archive.

I am using AutoFSelector and nested resampling with the following code:


ARMSS&lt;-read.csv(&quot;Index ARMSS Proteomics Final.csv&quot;, row.names=1)

set.seed(123, &quot;L&#39;Ecuyer&quot;)

task = as_task_regr(ARMSS, target = &quot;Index.ARMSS&quot;)

learner = lrn(&quot;regr.ranger&quot;, importance = &quot;impurity&quot;)

set_threads(learner, n = 8)

resampling_inner = rsmp(&quot;cv&quot;, folds = 7)
measure = msr(&quot;regr.rmse&quot;)
terminator = trm(&quot;none&quot;)

at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs(&quot;rfe&quot;, n_features = 1, feature_fraction = 0.5, recursive = FALSE),
  store_models = TRUE)

resampling_outer = rsmp(&quot;repeated_cv&quot;, folds = 10, repeats = 10)

rr = resample(task, at, resampling_outer, store_models = TRUE)

Should I use the extract_inner_fselect_archives() command to identify each iteration with the smallest RMSE and the features that were selected then rereun the code above with the n_features argument changed? How do I reconcile differences across iterations in the number of features and/or the features selected?

答案1

得分: 2

嵌套重采样是一种统计程序,用于估计在完整数据集上训练的模型的预测性能,它不是选择最佳超参数的程序。嵌套重采样生成许多不应用于构建最终模型的超参数配置。

有关详细信息,请参阅mlr3book 第4章 - 优化

对于特征选择也是如此。您不使用嵌套重采样选择特征集。您估计最终模型的性能。

建议使用存档。

如果没有嵌套重采样,您只需调用instance$resultat$fselect_result以获取具有最低均方根误差的特征子集。

英文:

> Nested resampling is a statistical procedure to estimate the predictive performance of the model trained on the full dataset, it is not a procedure to select optimal hyperparameters. Nested resampling produces many hyperparameter configurations which should not be used to construct a final model.

mlr3book Chapter 4 - Optimization.

The same is true for feature selection. You don't select a feature set with nested resampling. You estimate the performance of the final model.

> it was suggested to use the archive

Without nested resampling, you just call instance$result or at$fselect_result to get the feature subset with the lowest rmse.

huangapple
  • 本文由 发表于 2023年2月7日 02:44:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定