如何在调用 `lm()` 时命名公式中创建的项?

huangapple go评论85阅读模式
英文:

How to name a term created in the formula when calling `lm()`?

问题

可以在公式中创建一个项并为其命名吗?是的,可以如下方式实现:

out4 <- lm(y ~ new_term = relevel(factor(x), ref = "C"), dat)
summary(out4)
#> 
#> Call:
#> lm(formula = y ~ new_term = relevel(factor(x), ref = "C"), data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               2.6551     0.1594  16.653  < 2e-16 ***
#> new_term = relevel(factor(x), ref = "C")A  -0.5413     0.2350  -2.303   0.0234 *  
#> new_term = relevel(factor(x), ref = "C")B   1.1359     0.2209   5.143 1.41e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9297 on 97 degrees of freedom
#> Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#> F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

在这个示例中,我们在公式中为新创建的项命名为 new_term。这样就可以同时创建项并为其命名。

英文:

Is it possible to name a term created in a formula? This is the scenario:

Create a toy dataset:

set.seed(67253)
n &lt;- 100
x &lt;- sample(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), size = n, replace = TRUE)
y &lt;- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat &lt;- data.frame(x, y)
head(dat)
#&gt;   x         y
#&gt; 1 B 4.5014474
#&gt; 2 C 4.0252796
#&gt; 3 C 2.4958761
#&gt; 4 C 0.6725571
#&gt; 5 B 4.3364206
#&gt; 6 C 3.9798909

Fit a regression model:

out &lt;- lm(y ~ x, dat)
summary(out)
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ x, data = dat)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#&gt; 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)   2.1138     0.1726  12.244  &lt; 2e-16 ***
#&gt; xB            1.6772     0.2306   7.274 9.04e-11 ***
#&gt; xC            0.5413     0.2350   2.303   0.0234 *  
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 0.9297 on 97 degrees of freedom
#&gt; Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#&gt; F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

Fit the model again, but use &quot;C&quot; as the reference group:

out2 &lt;- lm(y ~ relevel(factor(x), ref = &quot;C&quot;), dat)
summary(out2)
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ relevel(factor(x), ref = &quot;C&quot;), data = dat)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#&gt; 
#&gt; Coefficients:
#&gt;                                Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)                      2.6551     0.1594  16.653  &lt; 2e-16 ***
#&gt; relevel(factor(x), ref = &quot;C&quot;)A  -0.5413     0.2350  -2.303   0.0234 *  
#&gt; relevel(factor(x), ref = &quot;C&quot;)B   1.1359     0.2209   5.143 1.41e-06 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 0.9297 on 97 degrees of freedom
#&gt; Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#&gt; F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

The variable, x, was re-leveled in the second call to lm(). This is done in the formula and so the name of this term is relevel(factor(x), ref = &quot;C&quot;).

Certainly, we can create the term before calling lm(), e.g.:

dat$x2 &lt;- relevel(factor(x), ref = &quot;C&quot;)
out3 &lt;- lm(y ~ x2, dat)
summary(out3)
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ x2, data = dat)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#&gt; 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)   2.6551     0.1594  16.653  &lt; 2e-16 ***
#&gt; x2A          -0.5413     0.2350  -2.303   0.0234 *  
#&gt; x2B           1.1359     0.2209   5.143 1.41e-06 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 0.9297 on 97 degrees of freedom
#&gt; Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#&gt; F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

However, can I create a term and name it in the formula? If yes, how?

答案1

得分: 1

从此评论中获取的信息进行调整:https://stackoverflow.com/questions/26870664/rename-model-terms-in-lm-object-for-forecasting#comment42302348_26870664

set.seed(67253)
n <- 100
x <- sample(c("A", "B", "C"), size = n, replace = TRUE)
y <- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat <- data.frame(x, y)

out <- lm(y ~ x, dat)
summary(out)

out2 <- lm(y ~ x2, transform(dat,
x2=relevel(factor(x), ref = "C")))
summary(out2)


<details>
<summary>英文:</summary>

adapted from the info in this comment : https://stackoverflow.com/questions/26870664/rename-model-terms-in-lm-object-for-forecasting#comment42302348_26870664

set.seed(67253)
n <- 100
x <- sample(c("A", "B", "C"), size = n, replace = TRUE)
y <- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat <- data.frame(x, y)

out <- lm(y ~ x, dat)
summary(out)

out2 <- lm(y ~ x2, transform(dat,
x2=relevel(factor(x), ref = "C")))
summary(out2)

huangapple
  • 本文由 发表于 2023年6月1日 17:54:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380693.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定