2023年5月7日 19:54:59go评论112阅读模式

英文:

How Do I Perform Hyperparameter Optimization for a Non-Toy Dataset in R Using mlr3hyperband?

问题

I have a dataset, let's call it "train.csv", that I want to use to train an XGBoost predictive model. Now under the example given by the mlr3hyperband documentation, the steps to perform hyperparameter optimization are as follows:

library(mlr3hyperband)
library(mlr3learners)
learner = lrn("classif.xgboost",
  nrounds           = to_tune(p_int(27, 243, tags = "budget")),
  eta               = to_tune(1e-4, 1, logscale = TRUE),
  max_depth         = to_tune(1, 20),
  colsample_bytree  = to_tune(1e-1, 1),
  colsample_bylevel = to_tune(1e-1, 1),
  lambda            = to_tune(1e-3, 1e3, logscale = TRUE),
  alpha             = to_tune(1e-3, 1e3, logscale = TRUE),
  subsample         = to_tune(1e-1, 1)
)
instance = tune(
  tnr("hyperband", eta = 3),
  task = tsk("pima"), # This is the point of challenge.
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce")
)
instance$result

However, the "task" parameter under the "instance" function refers to a toy dataset - the pima dataset. I want to tune the model using the train.csv, not these datasets, but I'm not sure how to go about it. I've tried removing the task parameter entirely, but it's needed for the function to run. I've also tried assigning the task parameter to the dataframes of the variable, but that doesn't work either.

# None of the below work.
task = tsk(train)
task = train

英文:

I have a dataset, let's call it "train.csv",

train = na.omit(read_csv(&#39;train.csv&#39;))

that I want to use to train an XGBoost predictive model. Now under the example given by the mlr3hyperband documentation, the steps to perform hyperparameter optimization are as follows:

library(mlr3hyperband)
library(mlr3learners)
learner = lrn(&quot;classif.xgboost&quot;,
  nrounds           = to_tune(p_int(27, 243, tags = &quot;budget&quot;)),
  eta               = to_tune(1e-4, 1, logscale = TRUE),
  max_depth         = to_tune(1, 20),
  colsample_bytree  = to_tune(1e-1, 1),
  colsample_bylevel = to_tune(1e-1, 1),
  lambda            = to_tune(1e-3, 1e3, logscale = TRUE),
  alpha             = to_tune(1e-3, 1e3, logscale = TRUE),
  subsample         = to_tune(1e-1, 1)
)
instance = tune(
  tnr(&quot;hyperband&quot;, eta = 3),
  task = tsk(&quot;pima&quot;), # This is the point of challenge.
  learner = learner,
  resampling = rsmp(&quot;cv&quot;, folds = 3),
  measures = msr(&quot;classif.ce&quot;)
)
instance$result

# None of the below work.
task = tsk(train)
task = train

答案1

得分: 1

根据mlr3book，您需要构建自己的任务：

我们还可以单独加载数据并将其转换为任务，而不使用mlr3提供的tsk()函数。
如果我们要使用的数据不是mlr3附带的，就必须这样做。

这种方式 可以是as_task_regr()或as_task_classif()（请参见2.1.1 构建任务）。

免责声明：没有自己的mlr3经验。

英文:

According to the mlr3book you need to construct your own task:
> We can also load the data separately and convert it to a task,
> without using the tsk() function that mlr3 provides.
> If the data we want to use does not come with mlr3,
> it has to be done this way.

this way being e.g. as_task_regr() or as_task_classif() (see 2.1.1 Constructing Tasks)

disclaimer: no own mlr3 experience

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中使用mlr3hyperband对非玩具数据集进行超参数优化？

问题

答案1

创建 Quarto 或 R Markdown 文档的代码块，其中源代码存储为向量中的元素。

无错误，但使用R进行网页抓取时导致空数据框。

Trying to Optimize Process Using Linear Programming. Getting error about: IndexError: index 1 is out of bounds for axis 0 with size 1

如何使用逗号作为千位分隔符来格式化表格1的数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。