如何在R中使用mlr3hyperband对非玩具数据集进行超参数优化?

huangapple go评论87阅读模式
英文:

How Do I Perform Hyperparameter Optimization for a Non-Toy Dataset in R Using mlr3hyperband?

问题

I have a dataset, let's call it "train.csv", that I want to use to train an XGBoost predictive model. Now under the example given by the mlr3hyperband documentation, the steps to perform hyperparameter optimization are as follows:

library(mlr3hyperband)
library(mlr3learners)

learner = lrn("classif.xgboost",
  nrounds           = to_tune(p_int(27, 243, tags = "budget")),
  eta               = to_tune(1e-4, 1, logscale = TRUE),
  max_depth         = to_tune(1, 20),
  colsample_bytree  = to_tune(1e-1, 1),
  colsample_bylevel = to_tune(1e-1, 1),
  lambda            = to_tune(1e-3, 1e3, logscale = TRUE),
  alpha             = to_tune(1e-3, 1e3, logscale = TRUE),
  subsample         = to_tune(1e-1, 1)
)

instance = tune(
  tnr("hyperband", eta = 3),
  task = tsk("pima"), # This is the point of challenge.
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce")
)

instance$result

However, the "task" parameter under the "instance" function refers to a toy dataset - the pima dataset. I want to tune the model using the train.csv, not these datasets, but I'm not sure how to go about it. I've tried removing the task parameter entirely, but it's needed for the function to run. I've also tried assigning the task parameter to the dataframes of the variable, but that doesn't work either.

# None of the below work.
task = tsk(train)
task = train
英文:

I have a dataset, let's call it "train.csv",

train = na.omit(read_csv('train.csv'))

that I want to use to train an XGBoost predictive model. Now under the example given by the mlr3hyperband documentation, the steps to perform hyperparameter optimization are as follows:

library(mlr3hyperband)
library(mlr3learners)

learner = lrn("classif.xgboost",
  nrounds           = to_tune(p_int(27, 243, tags = "budget")),
  eta               = to_tune(1e-4, 1, logscale = TRUE),
  max_depth         = to_tune(1, 20),
  colsample_bytree  = to_tune(1e-1, 1),
  colsample_bylevel = to_tune(1e-1, 1),
  lambda            = to_tune(1e-3, 1e3, logscale = TRUE),
  alpha             = to_tune(1e-3, 1e3, logscale = TRUE),
  subsample         = to_tune(1e-1, 1)
)

instance = tune(
  tnr("hyperband", eta = 3),
  task = tsk("pima"), # This is the point of challenge.
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce")
)

instance$result

However, the "task" parameter under the "instance" function refers to a toy dataset - the pima dataset. I want to tune the model using the train.csv, not these datasets, but I'm not sure how to go about it. I've tried removing the task parameter entirely, but it's needed for the function to run. I've also tried assigning the task parameter to the dataframes of the variable, but that doesn't work either.

# None of the below work.
task = tsk(train)
task = train

答案1

得分: 1

根据mlr3book,您需要构建自己的任务:

我们还可以单独加载数据并将其转换为任务,而不使用mlr3提供的tsk()函数。
如果我们要使用的数据不是mlr3附带的,就必须这样做

这种方式 可以是as_task_regr()as_task_classif()(请参见2.1.1 构建任务)。

免责声明:没有自己的mlr3经验。

英文:

According to the mlr3book you need to construct your own task:
> We can also load the data separately and convert it to a task,
> without using the tsk() function that mlr3 provides.
> If the data we want to use does not come with mlr3,
> it has to be done this way
.

this way being e.g. as_task_regr() or as_task_classif() (see 2.1.1 Constructing Tasks)

disclaimer: no own mlr3 experience

huangapple
  • 本文由 发表于 2023年5月7日 19:54:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76193768.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定