问题

我想在R中使用DirichReg包对一个大数据集运行狄利克雷回归。我目前有一个包含37列和约13,000,000行的数据框。

然而，在所有数据上运行这个模型会立刻导致R崩溃。我正在使用一个具有16个核心和128 GB内存的Linux机器。即使仅削减数据到1000个数据点，R也几乎立刻崩溃并重新启动。

我是否做错了什么？有没有办法并行化此操作以使该模型运行？

我正在使用以下语法运行模型：

data.2 <- data

data.2$y_variable <- DR_data(data[,c(33:35)])

model <- DirichReg(y_variable ~ x_variable, data.2)

我必须在一个单独的data.2数据框中创建y_variable，因为运行data$y_variable <- DR_data(data[,c(33:35)])会导致R崩溃。我不知道为什么会这样。

英文:

I would like to run a Dirichlet regression on a large data set using the DirichReg Package in R. I currently have data.frame with 37 columns and ~13,000,000 rows.

However, running this model on all of my data instantly crashes R. I am using a Linux machine with 16 cores and 128 GB of memory. Even just cutting down my data to only 1000 points still causes R to almost immediately crash and restart.

Am I doing something wrong? Is there any way I can parallelize this operation to get this model to run?

I am running a model with the following syntax:

data.2 &lt;- data

data.2$y_variable &lt;- DR_data(data[,c(33:35)])

model &lt;- DirichReg(y_variable ~ x_variable, data.2)

I have to create the y_variable in a separate data.2 data.frame, because running data$y_variable <- DR_data(data[,c(33:35)]) will crash R. I have no idea why this is.

答案1

得分: 1

Bit of a guess why it's 'crashing' R, but if it's due to RAM issues then you can update the table by reference, rather than making a shallow copy of the entire data:

library(data.table)
setDT(data)
dat[, y := DR_data(data[,c(33:35)])]

英文:

Bit of a guess why it's 'crashing' R, but if it's due to RAM issues then you can update the table by reference, rather than making a shallow copy of the entire data:

library(data.table)
setDT(data)
dat[, y := DR_data(data[,c(33:35)])]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中使用大数据集运行狄利克雷回归？

问题

答案1

如何在Go中使用fmt.Scanf

如何在R Shiny中重用用户元素？

识别 R 中序列中大致等值数值的序列

R DBI::dbGetQuery的where子句将字符串解释为列名。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论