英文:
Using glm in R for linear regression on a large dataframe - issues with column subsetting
问题
我正在尝试在R中使用glm,使用包含约1000列的数据框,我想选择特定的自变量并为表示因变量的1000列中的每一列运行循环。
作为测试,当我使用df$col1
指定单个列作为我的自变量和因变量时,glm方程可以正常运行。
无论我如何尝试格式化数据框,我似乎无法正确地子集化一系列列(如下),我一直都会收到这个错误:
'data'必须是数据框、环境或列表
我尝试过的内容:
df = 我的数据框
cols <- df[, 20:1112]
for (i in cols){
glm <- glm(df$col1 ~ ., data=df, family=gaussian)
}
英文:
I am trying to use glm in R using a dataframe containing ~ 1000 columns, where I want to select a specific independent variable and run as a loop for each of the 1000 columns representing the dependent variables.
As a test, the glm equation works perfectly fine when I specify a single column using df$col1
for both my dependent and independent variables.
I can't seem to correctly subset a range of columns (below) and I keep getting this error, no matter how many ways I try to format the df:
'data' must be a data.frame, environment, or list
What I tried:
df = my df
cols <- df[, 20:1112]
for (i in cols{
glm <- glm(df$col1 ~ ., data=df, family=gaussian)
}
答案1
得分: 0
更符合习惯的做法是:
```r
predvars <- names(df)[20:1112]
glm_list <- list() ## 假设你想保存结果??
for (pv in predvars) {
glm_list[[pv]] <- glm(reformulate(pv, response = "col1"),
data=df, family=gaussian)
}
实际上,如果你只想执行高斯GLM,那么在循环中使用以下代码会略快一些:
lm(reformulate(pv, response = "col1"), data = df)
如果你想要更高级的操作:
formlist <- lapply(predvars, reformulate, response = "col1")
lm_list <- lapply(formlist, lm, data = df)
names(lm_list) <- predvars
<details>
<summary>英文:</summary>
It would be more idiomatic to do:
```r
predvars <- names(df)[20:1112]
glm_list <- list() ## presumably you want to save the results??
for (pv in predvars) {
glm_list[[pv]] <- glm(reformulate(pv, response = "col1"),
data=df, family=gaussian)
}
In fact, if you really just want to do a Gaussian GLM then it will be slightly faster to use
lm(reformulate(pv, response = "col1"), data = df)
in the loop instead.
If you want to get fancy:
formlist <- lapply(predvars, reformulate, response = "col1")
lm_list <- lapply(formlist, lm, data = df)
names(lm_list) <- predvars
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论