2023年2月7日 02:44:09go评论74阅读模式

英文:

RFE Termination Using RMSE with AutoFSelector

问题

模仿caret执行RFE并选择产生最低RMSE的特征，建议使用存档。我正在使用AutoFSelector和嵌套重采样，以下是代码：

ARMSS<-read.csv("Index ARMSS Proteomics Final.csv", row.names=1)
set.seed(123, "L'Ecuyer")
task = as_task_regr(ARMSS, target = "Index.ARMSS")
learner = lrn("regr.ranger", importance = "impurity")
set_threads(learner, n = 8)
resampling_inner = rsmp("cv", folds = 7)
measure = msr("regr.rmse")
terminator = trm("none")
at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs("rfe", n_features = 1, feature_fraction = 0.5, recursive = FALSE),
  store_models = TRUE)
resampling_outer = rsmp("repeated_cv", folds = 10, repeats = 10)
rr = resample(task, at, resampling_outer, store_models = TRUE)

我应该使用extract_inner_fselect_archives()命令来识别每次迭代中具有最小RMSE的特征，并记录选择的特征，然后再运行上述代码，改变n_features参数吗？如何解决在特征数量和/或所选特征方面在迭代之间的差异？

英文:

To mimic how caret performs RFE and select features that produce the lowest RMSE, it was suggested to use the archive.

I am using AutoFSelector and nested resampling with the following code:


ARMSS&lt;-read.csv(&quot;Index ARMSS Proteomics Final.csv&quot;, row.names=1)
set.seed(123, &quot;L&#39;Ecuyer&quot;)
task = as_task_regr(ARMSS, target = &quot;Index.ARMSS&quot;)
learner = lrn(&quot;regr.ranger&quot;, importance = &quot;impurity&quot;)
set_threads(learner, n = 8)
resampling_inner = rsmp(&quot;cv&quot;, folds = 7)
measure = msr(&quot;regr.rmse&quot;)
terminator = trm(&quot;none&quot;)
at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs(&quot;rfe&quot;, n_features = 1, feature_fraction = 0.5, recursive = FALSE),
  store_models = TRUE)
resampling_outer = rsmp(&quot;repeated_cv&quot;, folds = 10, repeats = 10)
rr = resample(task, at, resampling_outer, store_models = TRUE)

Should I use the extract_inner_fselect_archives() command to identify each iteration with the smallest RMSE and the features that were selected then rereun the code above with the n_features argument changed? How do I reconcile differences across iterations in the number of features and/or the features selected?

答案1

得分: 2

嵌套重采样是一种统计程序，用于估计在完整数据集上训练的模型的预测性能，它不是选择最佳超参数的程序。嵌套重采样生成许多不应用于构建最终模型的超参数配置。

有关详细信息，请参阅mlr3book 第4章 - 优化。

对于特征选择也是如此。您不使用嵌套重采样选择特征集。您估计最终模型的性能。

建议使用存档。

如果没有嵌套重采样，您只需调用instance$result或at$fselect_result以获取具有最低均方根误差的特征子集。

英文:

> Nested resampling is a statistical procedure to estimate the predictive performance of the model trained on the full dataset, it is not a procedure to select optimal hyperparameters. Nested resampling produces many hyperparameter configurations which should not be used to construct a final model.

mlr3book Chapter 4 - Optimization.

The same is true for feature selection. You don't select a feature set with nested resampling. You estimate the performance of the final model.

> it was suggested to use the archive

Without nested resampling, you just call instance$result or at$fselect_result to get the feature subset with the lowest rmse.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

RFE Termination Using RMSE with AutoFSelector

问题

答案1

mlr3的类别权重是否应用于验证分数计算？

变量重要性 P-值

如何在R中使用mlr3hyperband对非玩具数据集进行超参数优化？

在使用#mlr3寻找XGBoost的超参数时，Term_evals是关键部分。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。