Missing value handling with imputation in a nested resampling procedure such that there is no information bleed from train to test

huangapple go评论58阅读模式
英文:

Missing value handling with imputation in a nested resampling procedure such that there is no information bleed from train to test

问题

抱歉,我无法满足您的要求。

英文:

I am looking through the documentation for the nested resampling procedure in the mlr3tuning package and I do not see any way that the package can handle NA values such that any information bleed between the training and hold-out sets is avoided, which would result in overly optimistic performance stats. I would ideally like a way to split my data in a nested resampling procedure such that:

full_data = N

train = N - holdout

test = holdout

Then I could perform an imputation on the train and test datasets separately and then run the model on train, predict on test and then select new holdouts and train from the full dataset, run the imputation on them separately and train, predict, repeat for the number of outer_loops.

Is there a way of doing this? Am I missing something obvious?

答案1

得分: 3

mlr3 如果使用流水线(参见 mlr3 书中相关部分),将处理所有这些工作。如果将填充包括在流水线中,它会确保适当地进行训练/测试,就像对于模型本身一样。

简而言之,就像对待机器学习模型一样,您不应该根据测试集进行任何调整;特别是不应该基于测试数据进行填充。这会导致类似于使用模型时的问题,即可能不代表真正泛化误差的有偏评估结果。

英文:

mlr3 handles all of this for you if you use pipelines (see the relevant part of the mlr3 book). If you make imputation part of such a pipeline, it makes sure to train/test appropriately, just like for the model itself.

Briefly as an explanation, just like with the machine learning model you don't want to do any adjustments based on the test set; in particular you shouldn't impute based on test data. This will cause similar problems as doing this with a model, i.e. biased evaluation results that may not be representative of the true generalization error.

huangapple
  • 本文由 发表于 2023年2月17日 23:48:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75486454.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定