英文:
Why does update method does not work when estimation is wrapped in function?
问题
显然,如果我将估计函数包装在另一个函数中,update()
方法无法检索估计基于的数据集。是否有绕过这个问题的方法,例如通过指定环境?
library(fixest)
data(trade)
# 直接拟合模型并包装成函数
mod1 <- fepois(Euros ~ log(dist_km) | Origin + Destination, trade)
fit_model <- function(df) {
fepois(Euros ~ log(dist_km) | Origin + Destination, data = df)
}
mod2 <- fit_model(trade)
# 尝试更新
update(mod1, . ~ . + log(Year))
# > 泊松估计,因变量:Euros
# > 观察数:38,325
# > 固定效应:Origin: 15, Destination: 15
# > 标准误差:聚类(Origin)
# > 估计值 标准误差 t 值 Pr(>|t|)
# > log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
# > log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
# > ---
# > 显著性代码: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# > 对数似然:-1.212e+12 调整伪 R2:0.592897
# > BIC: 2.424e+12 平方相关性:0.384441
update(mod2, . ~ . + log(Year))
# > 错误:fepois(fml = Euros ~ log(dist_km) + log(Year) | Origin + Destination, : 参数 'data' 必须是:i) 一个矩阵,或 ii) 一个数据框。
# > 问题:它既不是一个矩阵也不是一个数据框(而是一个函数)。
创建于2023-02-26,使用reprex v2.0.2。
也发布在GitHub问题。
更新:解决方法似乎是强制提前评估引用数据集的表达式。另一种方法是在update()
中再次指定数据集:
update(mod2, . ~ . + log(Year), data = trade)
英文:
Apparently the update()
method cannot retrieve the dataset the estimation was based on if I wrap the estimation function in another function. Is there any way around this, e.g., by specifying an environment?
library(fixest)
data(trade)
# fit model directly and wrapped into function
mod1 <- fepois(Euros ~ log(dist_km) | Origin + Destination, trade)
fit_model <- function(df) {
fepois(Euros ~ log(dist_km) | Origin + Destination, data = df)
}
mod2 <- fit_model(trade)
# try to update
update(mod1, . ~ . + log(Year))
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error t value Pr(>|t|)
#> log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
#> log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.212e+12 Adj. Pseudo R2: 0.592897
#> BIC: 2.424e+12 Squared Cor.: 0.384441
update(mod2, . ~ . + log(Year))
#> Error in fepois(fml = Euros ~ log(dist_km) + log(Year) | Origin + Destination, : Argument 'data' must be either: i) a matrix, or ii) a data.frame.
#> Problem: it is not a matrix nor a data.frame (instead it is a function).
<sup>Created on 2023-02-26 with reprex v2.0.2</sup>
Also posted as a GitHub issue.
Update: The solution seems to be forcing an early evaluation of the expression that refers to the dataset. Another way is to specify the dataset again within update()
:
update(mod2, . ~ . + log(Year), data = trade)
答案1
得分: 2
以下是代码部分的翻译:
如果您想将任意的 df
传递给函数而不硬编码 trade
,我们需要在调用 fepois()
之前提前评估它。我们可以使用 eval(bquote())
来做到这一点,并将数据参数(在 mydat
下方)包装在 .()
中。为了更好地捕获对象名称,我们还可以在提前评估之前将数据参数包装在 substitute()
中(感谢 @jay.sf 的评论)。
更新:现在我添加了一个 env
参数,需要在 purrr::map()
和类似函数中使用时指定为 parent.frame()
。
以下是代码中的翻译:
library(fixest)
library(tidyverse)
data(trade)
fit_model <- function(mydat, env = environment()) {
eval(bquote(fepois(Euros ~ log(dist_km) | Origin + Destination, data = .(substitute(mydat, env = env))))
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
# > 泊松估计,依赖变量:Euros
# > 观测数:38,325
# > 固定效应:起始点:15, 目的地:15
# > 标准误差:集群(起始点)
# > 估计 标准误差 t 值 Pr(>|t|)
# > log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
# > log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
# > ---
# > 显著性标志: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# > 对数似然:-1.212e+12 调整伪R^2:0.592897
# > BIC: 2.424e+12 平方相关:0.384441
mod2$call
# > fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
res <- trade |>
nest(.by = Year) |>
mutate(fit = map(data, \(x) fit_model(x, parent.frame())))
res$fit[[1]]
# > 泊松估计,依赖变量:Euros
# > 观测数:3,793
# > 固定效应:起始点:15, 目的地:15
# > 标准误差:集群(起始点)
# > 估计 标准误差 t 值 Pr(>|t|)
# > log(dist_km) -1.48073 0.114878 -12.8896 < 2.2e-16 ***
# > ---
# > 显著性标志: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# > 对数似然:-1.082e+11 调整伪R^2:0.573982
# > BIC: 2.164e+11 平方相关:0.352497
res$fit[[1]]$call
# > fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = mydat)
于 2023-03-07 由 reprex 包 创建
英文:
If you want to pass an arbitrary df
into the function and not hard code trade
we have to evaluate it early before calling fepois()
. We can do this with eval(bquote())
and wrap the data argument (below mydat
) into .()
. To capture the object name nicely, we can further wrap the data argument in substitute()
before evaluating it early (thanks for the comment from @jay.sf).
Update: I now added an env
argument which needs to be specified with parent.frame()
when used inside purrr::map()
and similar functions.
library(fixest)
library(tidyverse)
data(trade)
fit_model <- function(mydat, env = environment()) {
eval(bquote(fepois(Euros ~ log(dist_km) | Origin + Destination, data = .(substitute(mydat, env = env)))))
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error t value Pr(>|t|)
#> log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
#> log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.212e+12 Adj. Pseudo R2: 0.592897
#> BIC: 2.424e+12 Squared Cor.: 0.384441
mod2$call
#> fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
res <- trade |>
nest(.by = Year) |>
mutate(fit = map(data, \(x) fit_model(x, parent.frame())))
res$fit[[1]]
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 3,793
#> Fixed-effects: Origin: 15, Destination: 15
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error t value Pr(>|t|)
#> log(dist_km) -1.48073 0.114878 -12.8896 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.082e+11 Adj. Pseudo R2: 0.573982
#> BIC: 2.164e+11 Squared Cor.: 0.352497
res$fit[[1]]$call
#> fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = mydat)
<sup>Created on 2023-03-07 by the reprex package (v2.0.1)</sup>
答案2
得分: 1
问题是,调用看起来像这样:
mod2$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = df)
其中数据应该是 data = trade
。
您可以使用 eval-parse
方法。有点巧妙,但有效。
fit_model2 <- function(df) {
eval(parse(text=sprintf('fepois(Euros ~ log(dist_km) | Origin + Destination, data = %s)',
deparse(substitute(df))))
}
mod2a <- fit_model2(trade)
mod2a$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
update(mod2a, . ~ . + log(Year))
# 泊松估计, 因变量: Euros
# 观测数: 38,325
# 固定效应: Origin: 15, Destination: 15
# 标准误: 聚类(Origin)
# 估计 标准误 t值 Pr(>|t|)
# log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
# log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
# ---
# 显著性代码: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 对数似然值: -1.212e+12 调整伪R2: 0.592897
# BIC: 2.424e+12 方差比值: 0.384441
请注意,我已经跳过了代码部分的翻译,只提供了翻译好的文本。
英文:
The problem is, that the call looks like
mod2$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = df)
where data should be data = trade
.
You could use an eval-parse
approach. A little hacky, but works.
fit_model2 <- function(df) {
eval(parse(text=sprintf('fepois(Euros ~ log(dist_km) | Origin + Destination, data = %s)',
deparse(substitute(df)))))
}
mod2a <- fit_model2(trade)
mod2a$call
# fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
update(mod2a, . ~ . + log(Year))
# Poisson estimation, Dep. Var.: Euros
# Observations: 38,325
# Fixed-effects: Origin: 15, Destination: 15
# Standard-errors: Clustered (Origin)
# Estimate Std. Error t value Pr(>|t|)
# log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
# log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Log-Likelihood: -1.212e+12 Adj. Pseudo R2: 0.592897
# BIC: 2.424e+12 Squared Cor.: 0.384441
答案3
得分: 0
尝试在 fit_model
函数中将 df
替换为 trade
,因为 fepois
不会像这样识别 df
数据:
fit_model <- function(trade) {
fepois(Euros ~ log(dist_km) | Origin + Destination, data = trade)
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
泊松估计,依赖变量:欧元
观测次数:38,325
固定效应:出发地:15,目的地:15
标准误差:集群(出发地)
估计值 标准误差 t 值 Pr(>|t|)
log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
对数似然:-1.212e+12 调整伪 R2:0.592897
BIC:2.424e+12 平方相关性:0.384441
英文:
Try to replace df
with trade
in the fit_model
function as the fepois
doesnt recognize the df data like this :
fit_model <- function(trade) {
fepois(Euros ~ log(dist_km) | Origin + Destination, data = trade)
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
Poisson estimation, Dep. Var.: Euros
Observations: 38,325
Fixed-effects: Origin: 15, Destination: 15
Standard-errors: Clustered (Origin)
Estimate Std. Error t value Pr(>|t|)
log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -1.212e+12 Adj. Pseudo R2: 0.592897
BIC: 2.424e+12 Squared Cor.: 0.384441
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论