2023年6月1日 17:54:14go评论115阅读模式

英文:

How to name a term created in the formula when calling `lm()`?

问题

可以在公式中创建一个项并为其命名吗？是的，可以如下方式实现：

out4 <- lm(y ~ new_term = relevel(factor(x), ref = "C"), dat)
summary(out4)
#> 
#> Call:
#> lm(formula = y ~ new_term = relevel(factor(x), ref = "C"), data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               2.6551     0.1594  16.653  < 2e-16 ***
#> new_term = relevel(factor(x), ref = "C")A  -0.5413     0.2350  -2.303   0.0234 *  
#> new_term = relevel(factor(x), ref = "C")B   1.1359     0.2209   5.143 1.41e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9297 on 97 degrees of freedom
#> Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#> F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

在这个示例中，我们在公式中为新创建的项命名为 new_term。这样就可以同时创建项并为其命名。

英文:

Is it possible to name a term created in a formula? This is the scenario:

Create a toy dataset:

set.seed(67253)
n &lt;- 100
x &lt;- sample(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), size = n, replace = TRUE)
y &lt;- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat &lt;- data.frame(x, y)
head(dat)
#&gt;   x         y
#&gt; 1 B 4.5014474
#&gt; 2 C 4.0252796
#&gt; 3 C 2.4958761
#&gt; 4 C 0.6725571
#&gt; 5 B 4.3364206
#&gt; 6 C 3.9798909

Fit a regression model:

out &lt;- lm(y ~ x, dat)
summary(out)
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ x, data = dat)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#&gt; 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)   2.1138     0.1726  12.244  &lt; 2e-16 ***
#&gt; xB            1.6772     0.2306   7.274 9.04e-11 ***
#&gt; xC            0.5413     0.2350   2.303   0.0234 *  
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 0.9297 on 97 degrees of freedom
#&gt; Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#&gt; F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

Fit the model again, but use "C" as the reference group:

out2 &lt;- lm(y ~ relevel(factor(x), ref = &quot;C&quot;), dat)
summary(out2)
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ relevel(factor(x), ref = &quot;C&quot;), data = dat)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#&gt; 
#&gt; Coefficients:
#&gt;                                Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)                      2.6551     0.1594  16.653  &lt; 2e-16 ***
#&gt; relevel(factor(x), ref = &quot;C&quot;)A  -0.5413     0.2350  -2.303   0.0234 *  
#&gt; relevel(factor(x), ref = &quot;C&quot;)B   1.1359     0.2209   5.143 1.41e-06 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 0.9297 on 97 degrees of freedom
#&gt; Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#&gt; F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

The variable, x, was re-leveled in the second call to lm(). This is done in the formula and so the name of this term is relevel(factor(x), ref = "C").

Certainly, we can create the term before calling lm(), e.g.:

dat$x2 &lt;- relevel(factor(x), ref = &quot;C&quot;)
out3 &lt;- lm(y ~ x2, dat)
summary(out3)
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ x2, data = dat)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -2.07296 -0.52161 -0.03713  0.53898  2.12497 
#&gt; 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)   2.6551     0.1594  16.653  &lt; 2e-16 ***
#&gt; x2A          -0.5413     0.2350  -2.303   0.0234 *  
#&gt; x2B           1.1359     0.2209   5.143 1.41e-06 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 0.9297 on 97 degrees of freedom
#&gt; Multiple R-squared:  0.3703, Adjusted R-squared:  0.3573 
#&gt; F-statistic: 28.52 on 2 and 97 DF,  p-value: 1.808e-10

However, can I create a term and name it in the formula? If yes, how?

答案1

得分: 1

从此评论中获取的信息进行调整：https://stackoverflow.com/questions/26870664/rename-model-terms-in-lm-object-for-forecasting#comment42302348_26870664

set.seed(67253)
n <- 100
x <- sample(c("A", "B", "C"), size = n, replace = TRUE)
y <- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat <- data.frame(x, y)

out <- lm(y ~ x, dat)
summary(out)

out2 <- lm(y ~ x2, transform(dat,
x2=relevel(factor(x), ref = "C")))
summary(out2)


<details>
<summary>英文:</summary>
adapted from the info in this comment : https://stackoverflow.com/questions/26870664/rename-model-terms-in-lm-object-for-forecasting#comment42302348_26870664

set.seed(67253)
n <- 100
x <- sample(c("A", "B", "C"), size = n, replace = TRUE)
y <- sapply(x, switch, A = 0, B = 2, C = 1) + rnorm(n, 2)
dat <- data.frame(x, y)

out <- lm(y ~ x, dat)
summary(out)

out2 <- lm(y ~ x2, transform(dat,
x2=relevel(factor(x), ref = "C")))
summary(out2)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在调用 `lm()` 时命名公式中创建的项？

问题

答案1

去掉所有距单词右边至少2个空格的数字和逗号。

如何在R中根据对象的值创建一个带有条件列名的数据框？

平滑置信区间和点估计在ggplot中

按照定义的间隔对一列进行分组和汇总。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。