英文:
How Do I Perform Hyperparameter Optimization for a Non-Toy Dataset in R Using mlr3hyperband?
问题
I have a dataset, let's call it "train.csv", that I want to use to train an XGBoost predictive model. Now under the example given by the mlr3hyperband documentation, the steps to perform hyperparameter optimization are as follows:
library(mlr3hyperband)
library(mlr3learners)
learner = lrn("classif.xgboost",
nrounds = to_tune(p_int(27, 243, tags = "budget")),
eta = to_tune(1e-4, 1, logscale = TRUE),
max_depth = to_tune(1, 20),
colsample_bytree = to_tune(1e-1, 1),
colsample_bylevel = to_tune(1e-1, 1),
lambda = to_tune(1e-3, 1e3, logscale = TRUE),
alpha = to_tune(1e-3, 1e3, logscale = TRUE),
subsample = to_tune(1e-1, 1)
)
instance = tune(
tnr("hyperband", eta = 3),
task = tsk("pima"), # This is the point of challenge.
learner = learner,
resampling = rsmp("cv", folds = 3),
measures = msr("classif.ce")
)
instance$result
However, the "task" parameter under the "instance" function refers to a toy dataset - the pima dataset. I want to tune the model using the train.csv, not these datasets, but I'm not sure how to go about it. I've tried removing the task parameter entirely, but it's needed for the function to run. I've also tried assigning the task parameter to the dataframes of the variable, but that doesn't work either.
# None of the below work.
task = tsk(train)
task = train
英文:
I have a dataset, let's call it "train.csv",
train = na.omit(read_csv('train.csv'))
that I want to use to train an XGBoost predictive model. Now under the example given by the mlr3hyperband documentation, the steps to perform hyperparameter optimization are as follows:
library(mlr3hyperband)
library(mlr3learners)
learner = lrn("classif.xgboost",
nrounds = to_tune(p_int(27, 243, tags = "budget")),
eta = to_tune(1e-4, 1, logscale = TRUE),
max_depth = to_tune(1, 20),
colsample_bytree = to_tune(1e-1, 1),
colsample_bylevel = to_tune(1e-1, 1),
lambda = to_tune(1e-3, 1e3, logscale = TRUE),
alpha = to_tune(1e-3, 1e3, logscale = TRUE),
subsample = to_tune(1e-1, 1)
)
instance = tune(
tnr("hyperband", eta = 3),
task = tsk("pima"), # This is the point of challenge.
learner = learner,
resampling = rsmp("cv", folds = 3),
measures = msr("classif.ce")
)
instance$result
However, the "task" parameter under the "instance" function refers to a toy dataset - the pima dataset. I want to tune the model using the train.csv, not these datasets, but I'm not sure how to go about it. I've tried removing the task parameter entirely, but it's needed for the function to run. I've also tried assigning the task parameter to the dataframes of the variable, but that doesn't work either.
# None of the below work.
task = tsk(train)
task = train
答案1
得分: 1
根据mlr3book,您需要构建自己的任务:
我们还可以单独加载数据并将其转换为任务,而不使用mlr3提供的tsk()函数。
如果我们要使用的数据不是mlr3附带的,就必须这样做。
这种方式 可以是as_task_regr()
或as_task_classif()
(请参见2.1.1 构建任务)。
免责声明:没有自己的mlr3经验。
英文:
According to the mlr3book you need to construct your own task:
> We can also load the data separately and convert it to a task,
> without using the tsk() function that mlr3 provides.
> If the data we want to use does not come with mlr3,
> it has to be done this way.
this way being e.g. as_task_regr()
or as_task_classif()
(see 2.1.1 Constructing Tasks)
disclaimer: no own mlr3 experience
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论