英文:
How to run Dirichlet Regression with a big data set in R?
问题
我想在R中使用DirichReg包对一个大数据集运行狄利克雷回归。我目前有一个包含37列和约13,000,000行的数据框。
然而,在所有数据上运行这个模型会立刻导致R崩溃。我正在使用一个具有16个核心和128 GB内存的Linux机器。即使仅削减数据到1000个数据点,R也几乎立刻崩溃并重新启动。
我是否做错了什么?有没有办法并行化此操作以使该模型运行?
我正在使用以下语法运行模型:
data.2 <- data
data.2$y_variable <- DR_data(data[,c(33:35)])
model <- DirichReg(y_variable ~ x_variable, data.2)
我必须在一个单独的data.2数据框中创建y_variable,因为运行data$y_variable <- DR_data(data[,c(33:35)])
会导致R崩溃。我不知道为什么会这样。
英文:
I would like to run a Dirichlet regression on a large data set using the DirichReg Package in R. I currently have data.frame with 37 columns and ~13,000,000 rows.
However, running this model on all of my data instantly crashes R. I am using a Linux machine with 16 cores and 128 GB of memory. Even just cutting down my data to only 1000 points still causes R to almost immediately crash and restart.
Am I doing something wrong? Is there any way I can parallelize this operation to get this model to run?
I am running a model with the following syntax:
data.2 <- data
data.2$y_variable <- DR_data(data[,c(33:35)])
model <- DirichReg(y_variable ~ x_variable, data.2)
I have to create the y_variable in a separate data.2 data.frame, because running data$y_variable <- DR_data(data[,c(33:35)])
will crash R. I have no idea why this is.
答案1
得分: 1
Bit of a guess why it's 'crashing' R, but if it's due to RAM issues then you can update the table by reference, rather than making a shallow copy of the entire data:
library(data.table)
setDT(data)
dat[, y := DR_data(data[,c(33:35)])]
英文:
Bit of a guess why it's 'crashing' R, but if it's due to RAM issues then you can update the table by reference, rather than making a shallow copy of the entire data:
library(data.table)
setDT(data)
dat[, y := DR_data(data[,c(33:35)])]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论