2023年2月14日 21:20:28go评论97阅读模式

英文:

"variable lengths differ" error while running regressions in a loop

问题

我正在尝试运行一个基于我在以前的回答中找到的代码的回归循环（https://stackoverflow.com/q/27952653/21208453），但我一直收到一个错误。我的因变量（依赖变量）有940个变量（代谢物），我的自变量（独立变量）是"bmi"、"Age"、"sex"、"lpa2c"和"smoking"，其中BMI和Age是连续变量。BMI是平均暴露，对其他变量我在控制。所以我正在测试BMI对940个代谢物的影响。此外，我想知道如何提取BMI的系数、p值、标准误差和置信区间，仅当它显著时。

这是我使用的代码：

y <- c(1653:2592) # 响应变量
x1 <- c("bmi","Age", "sex","lpa2c", "smoking") # 预测变量
for (i in x1){ 
  model <- lm(paste("y ~", i), data = QBB_clean) 
  print(summary(model)) 
}

这是错误：

Error in model.frame.default(formula = paste("y ~", i), data = QBB_clean) :
variable lengths differ (found for 'bmi').

如你所见，代码中有一些HTML和JavaScript的内容，这些内容不需要翻译。如果你需要关于代码的进一步解释或帮助，请随时提问。

英文:

I am trying to run a regression loop based on code that I have found in a previous answer (https://stackoverflow.com/q/27952653/21208453) but I keep getting an error. My outcomes (dependent) are 940 variables (metabolites) and my exposure (independent) are "bmi","Age", "sex","lpa2c", and "smoking". where BMI and Age are continuous. BMI is the mean exposure, and for others, I am controlling for them.
So I'm testing the effect of BMI on 940 metabolites.
Also, I would like to know how I can extract coefficient, p-value, standard error, and confidence interval for BMI only and when it is significant.

This is the code I have used:

y&lt;- c(1653:2592) # response 
x1&lt;- c(&quot;bmi&quot;,&quot;Age&quot;, &quot;sex&quot;,&quot;lpa2c&quot;, &quot;smoking&quot;) # predictor 
for (i in x1){ 
  model &lt;- lm(paste(&quot;y ~&quot;, i[[1]]), data= QBB_clean) 
  print(summary(model)) 
}

And this is the error:

> Error in model.frame.default(formula = paste("y ~", i[[1]]), data = QBB_clean, :
variable lengths differ (found for 'bmi').
>

              y1         y2          y3          y4 bmi age sex       lpa2c smoking
1   0.2875775201 0.59998896 0.238726027 0.784575267  24  18   1 0.470681834       1
2   0.7883051354 0.33282354 0.962358936 0.009429905  12  20   0 0.365845473       1
3   0.4089769218 0.48861303 0.601365726 0.779065883  18  15   0 0.121272054       0
4   0.8830174040 0.95447383 0.515029727 0.729390652  16  21   0 0.046993681       0
5   0.9404672843 0.48290240 0.402573342 0.630131853  18  28   1 0.262796304       1
6   0.0455564994 0.89035022 0.880246541 0.480910830  13  13   0 0.968641168       1
7   0.5281054880 0.91443819 0.364091865 0.156636851  11  12   0 0.488495482       1
8   0.8924190444 0.60873498 0.288239281 0.008215520  21  23   0 0.477822030       0
9   0.5514350145 0.41068978 0.170645235 0.452458394  18  17   1 0.748792881       0
10  0.4566147353 0.14709469 0.172171746 0.492293329  20  15   1 0.667640231       1

答案1

得分: 1

如果您想循环遍历响应变量，您可能需要类似以下的代码：

respvars <- names(QBB_clean[1653:2592]) 
predvars <- c("bmi","Age", "sex","lpa2c", "smoking")
results <- list()
for (v in respvars) { 
  form <- reformulate(predvars, response = v)
  results[[v]] <- lm(form, data = QBB_clean)
}

然后，您可以使用类似 lapply(results, summary) 的方式打印结果，提取系数等等。（我稍微有点难以理解只是打印 940 次回归结果会有多大用处...您真的打算检查它们吗？）

如果您想要BMI的系数等信息，我认为以下代码应该可以工作（未经测试）：

t(sapply(results, function(m) coef(summary(m))["bmi",]))

或者要获取系数的区间：

t(sapply(results, function(m) confint(m)["bmi",]))

英文:

If you want to loop over responses you will want something like this:

respvars &lt;- names(QBB_clean[1653:2592]) 
predvars &lt;- c(&quot;bmi&quot;,&quot;Age&quot;, &quot;sex&quot;,&quot;lpa2c&quot;, &quot;smoking&quot;)
results &lt;- list()
for (v in respvars) { 
  form &lt;- reformulate(predvars, response = v)
  results[[v]] &lt;- lm(form, data = QBB_clean)
}

You can then print the results with something like lapply(results, summary), extract coefficients, etc.. (I have a little trouble seeing how it's going to be useful to just print the results of 940 regressions ... are you really going to inspect them all?

If you want coefficients etc. for BMI, I think this should work (not tested):

t(sapply(results, function(m) coef(summary(m))[&quot;bmi&quot;,]))

Or for coefficients:

t(sapply(results, function(m) confint(m)[&quot;bmi&quot;,]))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“variable lengths differ” error while running regressions in a loop

问题

答案1

Excel VBA- How to loop through specific sheets in a workbook and format the same ranges in each sheet

根据R中某一列中特定数量的唯一值，筛选数据框。

在 Golang 中，循环内部追加操作会重复最后一个值。

在R中创建分组条形图时间序列？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。