在R中使用glm进行大型数据框的线性回归 – 列子集的问题

huangapple go评论61阅读模式
英文:

Using glm in R for linear regression on a large dataframe - issues with column subsetting

问题

我正在尝试在R中使用glm,使用包含约1000列的数据框,我想选择特定的自变量并为表示因变量的1000列中的每一列运行循环。

作为测试,当我使用df$col1指定单个列作为我的自变量和因变量时,glm方程可以正常运行。

无论我如何尝试格式化数据框,我似乎无法正确地子集化一系列列(如下),我一直都会收到这个错误:

'data'必须是数据框、环境或列表

我尝试过的内容:

df = 我的数据框
cols <- df[, 20:1112]

for (i in cols){
    glm <- glm(df$col1 ~ ., data=df, family=gaussian)
}
英文:

I am trying to use glm in R using a dataframe containing ~ 1000 columns, where I want to select a specific independent variable and run as a loop for each of the 1000 columns representing the dependent variables.

As a test, the glm equation works perfectly fine when I specify a single column using df$col1 for both my dependent and independent variables.

I can't seem to correctly subset a range of columns (below) and I keep getting this error, no matter how many ways I try to format the df:

&#39;data&#39; must be a data.frame, environment, or list

What I tried:

df = my df
cols &lt;- df[, 20:1112]

for (i in cols{
    glm &lt;- glm(df$col1 ~ ., data=df, family=gaussian)
}

答案1

得分: 0

更符合习惯的做法是:

```r
predvars <- names(df)[20:1112]
glm_list <- list()  ## 假设你想保存结果??
for (pv in predvars) {
    glm_list[[pv]] <- glm(reformulate(pv, response = "col1"), 
       data=df, family=gaussian)
}

实际上,如果你只想执行高斯GLM,那么在循环中使用以下代码会略快一些:

lm(reformulate(pv, response = "col1"), data = df)

如果你想要更高级的操作:

formlist <- lapply(predvars, reformulate, response = "col1")
lm_list <- lapply(formlist, lm, data = df)
names(lm_list) <- predvars

<details>
<summary>英文:</summary>

It would be more idiomatic to do:


```r
predvars &lt;- names(df)[20:1112]
glm_list &lt;- list()  ## presumably you want to save the results??
for (pv in predvars) {
    glm_list[[pv]] &lt;- glm(reformulate(pv, response = &quot;col1&quot;), 
       data=df, family=gaussian)
}

In fact, if you really just want to do a Gaussian GLM then it will be slightly faster to use

lm(reformulate(pv, response = &quot;col1&quot;), data = df)

in the loop instead.

If you want to get fancy:

formlist &lt;- lapply(predvars, reformulate, response = &quot;col1&quot;)
lm_list &lt;- lapply(formlist, lm, data = df)
names(lm_list) &lt;- predvars

huangapple
  • 本文由 发表于 2023年6月2日 03:57:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385291.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定