英文:
Use multiple strata for initial_split?
问题
我正在处理生物数据,基因数据,这些数据具有多个特征,我希望在我的训练和测试数据中正确反映这些特征。
然而,initial_split 函数只接受一个层。有没有一种好的方法可以使用多个层来创建我的数据的初始拆分?最好使用 tidymodels / tidyverse。
谢谢!
英文:
I am working with biological data, genes, that have multiple characteristics which I want to have reflected properly in my training and test data.
However, the initial_split function only accepts one strata. Is there a good way to create an initial split of my data using multiple strata? Preferably using tidymodels / tidyverse.
Thank you!
答案1
得分: 2
你需要创建一个复合列来进行分层。我们故意将分层限制在一列上;由此产生的样本大小可能会变得非常小,您可能无法进行分层。
您可以使用另一种方法(我最终会为此添加一个PR),即使用twinning(相应的R包)。
如果您仍然希望获得一个initial_split
对象,您可以使用rsample::make_splits
,并使用twinning结果的结果来创建一个。
英文:
You would have to make a composite column to stratify on. We've confined the strata to one column on purpose; the resulting sample sizes can get very small and you may not be able to stratify.
Another approach that you can use (that I will eventually add a PR for) is to use twinning (corresponding R package).
If you still want an initial_split
object, you can make one using rsample::make_splits
using the results of the twinning results.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论