如何在调用 `lm()` 时命名公式中创建的项?

huangapple go评论115阅读模式
英文:

How to name a term created in the formula when calling `lm()`?

问题

可以在公式中创建一个项并为其命名吗?是的,可以如下方式实现:

  1. out4 <- lm(y ~ new_term = relevel(factor(x), ref = "C"), dat)
  2. summary(out4)
  3. #>
  4. #> Call:
  5. #> lm(formula = y ~ new_term = relevel(factor(x), ref = "C"), data = dat)
  6. #>
  7. #> Residuals:
  8. #> Min 1Q Median 3Q Max
  9. #> -2.07296 -0.52161 -0.03713 0.53898 2.12497
  10. #>
  11. #> Coefficients:
  12. #> Estimate Std. Error t value Pr(>|t|)
  13. #> (Intercept) 2.6551 0.1594 16.653 < 2e-16 ***
  14. #> new_term = relevel(factor(x), ref = "C")A -0.5413 0.2350 -2.303 0.0234 *
  15. #> new_term = relevel(factor(x), ref = "C")B 1.1359 0.2209 5.143 1.41e-06 ***
  16. #> ---
  17. #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  18. #>
  19. #> Residual standard error: 0.9297 on 97 degrees of freedom
  20. #> Multiple R-squared: 0.3703, Adjusted R-squared: 0.3573
  21. #> F-statistic: 28.52 on 2 and 97 DF, p-value: 1.808e-10

在这个示例中,我们在公式中为新创建的项命名为 new_term。这样就可以同时创建项并为其命名。

英文:

Is it possible to name a term created in a formula? This is the scenario:

Create a toy dataset:

  1. set.seed(67253)
  2. n &lt;- 100
  3. x &lt;- sample(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), size = n, replace = TRUE)
  4. y &lt;- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
  5. dat &lt;- data.frame(x, y)
  6. head(dat)
  7. #&gt; x y
  8. #&gt; 1 B 4.5014474
  9. #&gt; 2 C 4.0252796
  10. #&gt; 3 C 2.4958761
  11. #&gt; 4 C 0.6725571
  12. #&gt; 5 B 4.3364206
  13. #&gt; 6 C 3.9798909

Fit a regression model:

  1. out &lt;- lm(y ~ x, dat)
  2. summary(out)
  3. #&gt;
  4. #&gt; Call:
  5. #&gt; lm(formula = y ~ x, data = dat)
  6. #&gt;
  7. #&gt; Residuals:
  8. #&gt; Min 1Q Median 3Q Max
  9. #&gt; -2.07296 -0.52161 -0.03713 0.53898 2.12497
  10. #&gt;
  11. #&gt; Coefficients:
  12. #&gt; Estimate Std. Error t value Pr(&gt;|t|)
  13. #&gt; (Intercept) 2.1138 0.1726 12.244 &lt; 2e-16 ***
  14. #&gt; xB 1.6772 0.2306 7.274 9.04e-11 ***
  15. #&gt; xC 0.5413 0.2350 2.303 0.0234 *
  16. #&gt; ---
  17. #&gt; Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
  18. #&gt;
  19. #&gt; Residual standard error: 0.9297 on 97 degrees of freedom
  20. #&gt; Multiple R-squared: 0.3703, Adjusted R-squared: 0.3573
  21. #&gt; F-statistic: 28.52 on 2 and 97 DF, p-value: 1.808e-10

Fit the model again, but use &quot;C&quot; as the reference group:

  1. out2 &lt;- lm(y ~ relevel(factor(x), ref = &quot;C&quot;), dat)
  2. summary(out2)
  3. #&gt;
  4. #&gt; Call:
  5. #&gt; lm(formula = y ~ relevel(factor(x), ref = &quot;C&quot;), data = dat)
  6. #&gt;
  7. #&gt; Residuals:
  8. #&gt; Min 1Q Median 3Q Max
  9. #&gt; -2.07296 -0.52161 -0.03713 0.53898 2.12497
  10. #&gt;
  11. #&gt; Coefficients:
  12. #&gt; Estimate Std. Error t value Pr(&gt;|t|)
  13. #&gt; (Intercept) 2.6551 0.1594 16.653 &lt; 2e-16 ***
  14. #&gt; relevel(factor(x), ref = &quot;C&quot;)A -0.5413 0.2350 -2.303 0.0234 *
  15. #&gt; relevel(factor(x), ref = &quot;C&quot;)B 1.1359 0.2209 5.143 1.41e-06 ***
  16. #&gt; ---
  17. #&gt; Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
  18. #&gt;
  19. #&gt; Residual standard error: 0.9297 on 97 degrees of freedom
  20. #&gt; Multiple R-squared: 0.3703, Adjusted R-squared: 0.3573
  21. #&gt; F-statistic: 28.52 on 2 and 97 DF, p-value: 1.808e-10

The variable, x, was re-leveled in the second call to lm(). This is done in the formula and so the name of this term is relevel(factor(x), ref = &quot;C&quot;).

Certainly, we can create the term before calling lm(), e.g.:

  1. dat$x2 &lt;- relevel(factor(x), ref = &quot;C&quot;)
  2. out3 &lt;- lm(y ~ x2, dat)
  3. summary(out3)
  4. #&gt;
  5. #&gt; Call:
  6. #&gt; lm(formula = y ~ x2, data = dat)
  7. #&gt;
  8. #&gt; Residuals:
  9. #&gt; Min 1Q Median 3Q Max
  10. #&gt; -2.07296 -0.52161 -0.03713 0.53898 2.12497
  11. #&gt;
  12. #&gt; Coefficients:
  13. #&gt; Estimate Std. Error t value Pr(&gt;|t|)
  14. #&gt; (Intercept) 2.6551 0.1594 16.653 &lt; 2e-16 ***
  15. #&gt; x2A -0.5413 0.2350 -2.303 0.0234 *
  16. #&gt; x2B 1.1359 0.2209 5.143 1.41e-06 ***
  17. #&gt; ---
  18. #&gt; Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
  19. #&gt;
  20. #&gt; Residual standard error: 0.9297 on 97 degrees of freedom
  21. #&gt; Multiple R-squared: 0.3703, Adjusted R-squared: 0.3573
  22. #&gt; F-statistic: 28.52 on 2 and 97 DF, p-value: 1.808e-10

However, can I create a term and name it in the formula? If yes, how?

答案1

得分: 1

  1. 从此评论中获取的信息进行调整:https://stackoverflow.com/questions/26870664/rename-model-terms-in-lm-object-for-forecasting#comment42302348_26870664

set.seed(67253)
n <- 100
x <- sample(c("A", "B", "C"), size = n, replace = TRUE)
y <- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat <- data.frame(x, y)

out <- lm(y ~ x, dat)
summary(out)

out2 <- lm(y ~ x2, transform(dat,
x2=relevel(factor(x), ref = "C")))
summary(out2)

  1. <details>
  2. <summary>英文:</summary>
  3. adapted from the info in this comment : https://stackoverflow.com/questions/26870664/rename-model-terms-in-lm-object-for-forecasting#comment42302348_26870664

set.seed(67253)
n <- 100
x <- sample(c("A", "B", "C"), size = n, replace = TRUE)
y <- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat <- data.frame(x, y)

out <- lm(y ~ x, dat)
summary(out)

out2 <- lm(y ~ x2, transform(dat,
x2=relevel(factor(x), ref = "C")))
summary(out2)

huangapple
  • 本文由 发表于 2023年6月1日 17:54:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380693.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定