2023年3月3日 18:32:12go评论99阅读模式

英文:

Why does update method does not work when estimation is wrapped in function?

问题

显然，如果我将估计函数包装在另一个函数中，update() 方法无法检索估计基于的数据集。是否有绕过这个问题的方法，例如通过指定环境？

library(fixest)
data(trade)
# 直接拟合模型并包装成函数
mod1 <- fepois(Euros ~ log(dist_km) | Origin + Destination, trade)
fit_model <- function(df) {
  fepois(Euros ~ log(dist_km) | Origin + Destination, data = df)
}
mod2 <- fit_model(trade)
# 尝试更新
update(mod1, . ~ . + log(Year))
# > 泊松估计，因变量：Euros
# > 观察数：38,325 
# > 固定效应：Origin: 15,  Destination: 15
# > 标准误差：聚类（Origin） 
# >              估计值  标准误差  t 值  Pr(>|t|)    
# > log(dist_km) -1.51756   0.113171 -13.4095 < 2.2e-16 ***
# > log(Year)    72.36888   6.899699  10.4887 < 2.2e-16 ***
# > ---
# > 显著性代码： 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# > 对数似然：-1.212e+12   调整伪 R2：0.592897
# >            BIC： 2.424e+12     平方相关性：0.384441
update(mod2, . ~ . + log(Year))
# > 错误：fepois(fml = Euros ~ log(dist_km) + log(Year) | Origin + Destination, : 参数 'data' 必须是：i) 一个矩阵，或 ii) 一个数据框。
# > 问题：它既不是一个矩阵也不是一个数据框（而是一个函数）。

^{创建于2023-02-26，使用reprex v2.0.2。}

也发布在GitHub问题。

更新：解决方法似乎是强制提前评估引用数据集的表达式。另一种方法是在update()中再次指定数据集：

update(mod2, . ~ . + log(Year), data = trade)

英文:

Apparently the update() method cannot retrieve the dataset the estimation was based on if I wrap the estimation function in another function. Is there any way around this, e.g., by specifying an environment?

library(fixest)
data(trade)
# fit model directly and wrapped into function
mod1 &lt;- fepois(Euros ~ log(dist_km) | Origin + Destination, trade)
fit_model &lt;- function(df) {
  fepois(Euros ~ log(dist_km) | Origin + Destination, data = df)
}
mod2 &lt;- fit_model(trade)
# try to update
update(mod1, . ~ . + log(Year))
#&gt; Poisson estimation, Dep. Var.: Euros
#&gt; Observations: 38,325 
#&gt; Fixed-effects: Origin: 15,  Destination: 15
#&gt; Standard-errors: Clustered (Origin) 
#&gt;              Estimate Std. Error  t value  Pr(&gt;|t|)    
#&gt; log(dist_km) -1.51756   0.113171 -13.4095 &lt; 2.2e-16 ***
#&gt; log(Year)    72.36888   6.899699  10.4887 &lt; 2.2e-16 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; Log-Likelihood: -1.212e+12   Adj. Pseudo R2: 0.592897
#&gt;            BIC:  2.424e+12     Squared Cor.: 0.384441
update(mod2, . ~ . + log(Year))
#&gt; Error in fepois(fml = Euros ~ log(dist_km) + log(Year) | Origin + Destination, : Argument &#39;data&#39; must be either: i) a matrix, or ii) a data.frame.
#&gt; Problem: it is not a matrix nor a data.frame (instead it is a function).

<sup>Created on 2023-02-26 with reprex v2.0.2</sup>

Also posted as a GitHub issue.

Update: The solution seems to be forcing an early evaluation of the expression that refers to the dataset. Another way is to specify the dataset again within update():

update(mod2, . ~ . + log(Year), data = trade)

答案1

得分: 2

以下是代码部分的翻译：

如果您想将任意的 df 传递给函数而不硬编码 trade，我们需要在调用 fepois() 之前提前评估它。我们可以使用 eval(bquote()) 来做到这一点，并将数据参数（在 mydat 下方）包装在 .() 中。为了更好地捕获对象名称，我们还可以在提前评估之前将数据参数包装在 substitute() 中（感谢 @jay.sf 的评论）。

更新：现在我添加了一个 env 参数，需要在 purrr::map() 和类似函数中使用时指定为 parent.frame()。

以下是代码中的翻译：

library(fixest)
library(tidyverse)
data(trade)
fit_model <- function(mydat, env = environment()) {
  eval(bquote(fepois(Euros ~ log(dist_km) | Origin + Destination, data = .(substitute(mydat, env = env))))
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
# > 泊松估计，依赖变量：Euros
# > 观测数：38,325 
# > 固定效应：起始点：15,  目的地：15
# > 标准误差：集群（起始点） 
# >              估计  标准误差  t 值  Pr(>|t|)    
# > log(dist_km) -1.51756   0.113171 -13.4095 < 2.2e-16 ***
# > log(Year)    72.36888   6.899699  10.4887 < 2.2e-16 ***
# > ---
# > 显著性标志: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# > 对数似然：-1.212e+12   调整伪R^2：0.592897
# > BIC:  2.424e+12     平方相关：0.384441
mod2$call
# > fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
res <- trade |>
  nest(.by = Year) |>
  mutate(fit = map(data, \(x) fit_model(x, parent.frame())))
res$fit[[1]]
# > 泊松估计，依赖变量：Euros
# > 观测数：3,793 
# > 固定效应：起始点：15,  目的地：15
# > 标准误差：集群（起始点） 
# >              估计  标准误差  t 值  Pr(>|t|)    
# > log(dist_km) -1.48073   0.114878 -12.8896 < 2.2e-16 ***
# > ---
# > 显著性标志: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# > 对数似然：-1.082e+11   调整伪R^2：0.573982
# > BIC:  2.164e+11     平方相关：0.352497
res$fit[[1]]$call
# > fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = mydat)

^{于 2023-03-07 由 reprex 包创建}

英文:

If you want to pass an arbitrary df into the function and not hard code trade we have to evaluate it early before calling fepois(). We can do this with eval(bquote()) and wrap the data argument (below mydat) into .(). To capture the object name nicely, we can further wrap the data argument in substitute() before evaluating it early (thanks for the comment from @jay.sf).

Update: I now added an env argument which needs to be specified with parent.frame() when used inside purrr::map() and similar functions.

library(fixest)
library(tidyverse)
data(trade)
fit_model &lt;- function(mydat, env = environment()) {
  eval(bquote(fepois(Euros ~ log(dist_km) | Origin + Destination, data = .(substitute(mydat, env = env)))))
}
mod2 &lt;- fit_model(trade)
update(mod2, . ~ . + log(Year))
#&gt; Poisson estimation, Dep. Var.: Euros
#&gt; Observations: 38,325 
#&gt; Fixed-effects: Origin: 15,  Destination: 15
#&gt; Standard-errors: Clustered (Origin) 
#&gt;              Estimate Std. Error  t value  Pr(&gt;|t|)    
#&gt; log(dist_km) -1.51756   0.113171 -13.4095 &lt; 2.2e-16 ***
#&gt; log(Year)    72.36888   6.899699  10.4887 &lt; 2.2e-16 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; Log-Likelihood: -1.212e+12   Adj. Pseudo R2: 0.592897
#&gt;            BIC:  2.424e+12     Squared Cor.: 0.384441
mod2$call
#&gt; fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
res &lt;- trade |&gt;
  nest(.by = Year) |&gt;
  mutate(fit = map(data, \(x) fit_model(x, parent.frame())))
res$fit[[1]]
#&gt; Poisson estimation, Dep. Var.: Euros
#&gt; Observations: 3,793 
#&gt; Fixed-effects: Origin: 15,  Destination: 15
#&gt; Standard-errors: Clustered (Origin) 
#&gt;              Estimate Std. Error  t value  Pr(&gt;|t|)    
#&gt; log(dist_km) -1.48073   0.114878 -12.8896 &lt; 2.2e-16 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; Log-Likelihood: -1.082e+11   Adj. Pseudo R2: 0.573982
#&gt;            BIC:  2.164e+11     Squared Cor.: 0.352497
res$fit[[1]]$call
#&gt; fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = mydat)

<sup>Created on 2023-03-07 by the reprex package (v2.0.1)</sup>

答案2

得分: 1

问题是，调用看起来像这样：

mod2$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = df)

其中数据应该是 data = trade。

您可以使用 eval-parse 方法。有点巧妙，但有效。

fit_model2 <- function(df) {
  eval(parse(text=sprintf('fepois(Euros ~ log(dist_km) | Origin + Destination, data = %s)', 
                          deparse(substitute(df))))
}
mod2a <- fit_model2(trade)
mod2a$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
update(mod2a, . ~ . + log(Year))
# 泊松估计, 因变量: Euros
# 观测数: 38,325
# 固定效应: Origin: 15,  Destination: 15
# 标准误: 聚类（Origin）
# 估计  标准误  t值  Pr(>|t|)    
# log(dist_km) -1.51756   0.113171 -13.4095 < 2.2e-16 ***
# log(Year)    72.36888   6.899699  10.4887 < 2.2e-16 ***
# ---
# 显著性代码: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 对数似然值: -1.212e+12   调整伪R2: 0.592897
# BIC:  2.424e+12     方差比值: 0.384441

请注意，我已经跳过了代码部分的翻译，只提供了翻译好的文本。

英文:

The problem is, that the call looks like

mod2$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = df)

where data should be data = trade.

You could use an eval-parse approach. A little hacky, but works.

fit_model2 &lt;- function(df) {
  eval(parse(text=sprintf(&#39;fepois(Euros ~ log(dist_km) | Origin + Destination, data = %s)&#39;, 
                          deparse(substitute(df)))))
}
mod2a &lt;- fit_model2(trade)
mod2a$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
update(mod2a, . ~ . + log(Year))
# Poisson estimation, Dep. Var.: Euros
# Observations: 38,325 
# Fixed-effects: Origin: 15,  Destination: 15
# Standard-errors: Clustered (Origin) 
# Estimate Std. Error  t value  Pr(&gt;|t|)    
# log(dist_km) -1.51756   0.113171 -13.4095 &lt; 2.2e-16 ***
# log(Year)    72.36888   6.899699  10.4887 &lt; 2.2e-16 ***
# ---
# Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
# Log-Likelihood: -1.212e+12   Adj. Pseudo R2: 0.592897
#            BIC:  2.424e+12     Squared Cor.: 0.384441

答案3

得分: 0

尝试在 fit_model 函数中将 df 替换为 trade，因为 fepois 不会像这样识别 df 数据：

fit_model <- function(trade) {
  fepois(Euros ~ log(dist_km) | Origin + Destination, data = trade)
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
泊松估计，依赖变量：欧元
观测次数：38,325
固定效应：出发地：15，目的地：15
标准误差：集群（出发地）
                 估计值 标准误差  t 值  Pr(>|t|)
log(dist_km) -1.51756   0.113171 -13.4095 < 2.2e-16 ***
log(Year)    72.36888   6.899699  10.4887 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
对数似然：-1.212e+12 调整伪 R2：0.592897
BIC：2.424e+12 平方相关性：0.384441

英文:

Try to replace df with trade in the fit_model function as the fepois doesnt recognize the df data like this :

fit_model &lt;- function(trade) {
  fepois(Euros ~ log(dist_km) | Origin + Destination, data = trade)
}
mod2 &lt;- fit_model(trade)
update(mod2, . ~ . + log(Year))
Poisson estimation, Dep. Var.: Euros
Observations: 38,325 
Fixed-effects: Origin: 15,  Destination: 15
Standard-errors: Clustered (Origin) 
             Estimate Std. Error  t value  Pr(&gt;|t|)    
log(dist_km) -1.51756   0.113171 -13.4095 &lt; 2.2e-16 ***
log(Year)    72.36888   6.899699  10.4887 &lt; 2.2e-16 ***
---
Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
Log-Likelihood: -1.212e+12   Adj. Pseudo R2: 0.592897
           BIC:  2.424e+12     Squared Cor.: 0.384441

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在估计包装在函数中时，更新方法不起作用？

问题

答案1

答案2

答案3

如何在R中使用大数据集运行狄利克雷回归？

在R中使用data.table的j位置上的函数。

有没有办法将异常值提取到一个单独的数据框中？

如何更改由tags$i生成的悬停文本的样式？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。