英文:
Does the `by` argument in `avg_comparisons` compute the strata specific marginal effect?
问题
我正在分析我们刚刚完成的AB测试的数据。我们的结果是二元的,y
,并且我们已经按第三个变量 g
进行了分层结果。
因为干预可能因 g
而异,我已经使用鲁棒协方差估计拟合了泊松回归,如下所示:
library(tidyverse)
library(sandwich)
library(marginaleffects)
fit <- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))
从这里,我想要了解各层特定的因果风险比(通常在行业中称为 "lift")。我的方法是使用 avg_comparisons
如下所示:
avg_comparisons(fit,
variables = 'treatment',
newdata = model_data,
transform_pre = 'lnratioavg',
transform_post = exp,
by=c('g'),
vcov = 'HC')
结果似乎与当我按 g
中的组筛选数据时计算的 "lift" 一致。
问题
通过传递 by=c('g')
,我是否实际上正在计算我怀疑的各层特定的风险比?是否有任何隐藏的 "陷阱" 或我未考虑到的事情?
如果需要,我可以提供数据和一个最小的工作示例。
英文:
I'm analyzing data from an AB test we just finished running. Our outcome is binary, y
, and we have stratified results by a third variable, g
.
Because the intervention could vary by g
, I've fit a Poisson regression with robust covariance estimation as follows
library(tidyverse)
library(sandwich)
library(marginaleffects)
fit <- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))
From here, I'd like to know the strata specific causal risk ration (which we usually call "lift" in industry). My approach is to use avg_comparisons
as follows
avg_comparisons(fit,
variables = 'treatment',
newdata = model_data,
transform_pre = 'lnratioavg',
transform_post = exp,
by=c('g'),
vcov = 'HC')
The result seems to be consistent with calculations of the lift when I filter the data by groups in g
.
Question
By passing by=c('g')
, am I actually calculating the strata specific risk ratios as I suspect? Is there any hidden "gotchas" or things I have failed to consider?
I can provide data and a minimal working example if need be.
答案1
得分: 1
以下是翻译的代码部分:
这是一个非常简单的基本`R`示例,展示了底层发生的情况:
library(marginaleffects)
fit <- glm(carb ~ hp * am, data = mtcars, family = poisson)
与更改hp
的值关联的对数比率的单元水平估计:
cmp <- comparisons(fit, variables = "hp", transform_pre = "lnratio")
cmp
#
# Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
# hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
# --- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects ---
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# Prediction type: response
# Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, carb, hp, am
这等同于:
# 预测网格,差异为1单位
lo <- transform(mtcars, hp = hp - .5)
hi <- transform(mtcars, hp = hp + .5)
# 响应比例上的预测
y_lo <- predict(fit, newdata = lo, type = "response")
y_hi <- predict(fit, newdata = hi, type = "response")
# 对数比率
lnratio <- log(y_hi / y_lo)
# 等同于`comparisons()`
all(cmp$estimate == lnratio)
# [1] TRUE
现在我们取各层的均值,并在log()
内部:
by(data.frame(am = lo$am, y_lo, y_hi),
mtcars$am,
FUN = \(x) log(mean(x$y_hi) / mean(x$y_lo)))
# mtcars$am: 0
# [1] 0.005364414
# ------------------------------------------------------------
# mtcars$am: 1
# [1] 0.005566092
与之相同:
avg_comparisons(fit, variables = "hp", by = "am", transform_pre = "lnratio") |>
print(digits = 7)
#
# Term Contrast am Estimate Std. Error z Pr(>|z|) 2.5 %
# hp mean(+1) 0 0.005364414 0.002701531 1.985694 0.04706726 6.951172e-05
# hp mean(+1) 1 0.005566092 0.001553855 3.582118 < 0.001 2.520592e-03
# 97.5 %
# 0.010659317
# 0.008611592
#
# Prediction type: response
# Columns: type, term, contrast, am, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
在此处查看转换函数的列表:https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations
唯一的不同之处在于by
在层内应用该函数。
英文:
Here’s a very simple base R
example to show what is happening under-the-hood:
library(marginaleffects)
fit <- glm(carb ~ hp * am, data = mtcars, family = poisson)
Unit level estimates of log ratio associated with a change of 1 in hp
:
cmp <- comparisons(fit, variables = "hp", transform_pre = "lnratio")
cmp
#
# Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
# hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
# --- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects ---
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# hp +1 0.0056 0.0016 3.6 <0.001 0.00252 0.0086
# Prediction type: response
# Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, carb, hp, am
This is equivalent to:
# prediction grids with 1 unit difference
lo <- transform(mtcars, hp = hp - .5)
hi <- transform(mtcars, hp = hp + .5)
# predictions on response scale
y_lo <- predict(fit, newdata = lo, type = "response")
y_hi <- predict(fit, newdata = hi, type = "response")
# log ratio
lnratio <- log(y_hi / y_lo)
# equivalent to `comparisons()`
all(cmp$estimate == lnratio)
# [1] TRUE
Now we take the strata specific means, with mean()
inside log()
:
by(data.frame(am = lo$am, y_lo, y_hi),
mtcars$am,
FUN = \(x) log(mean(x$y_hi) / mean(x$y_lo)))
# mtcars$am: 0
# [1] 0.005364414
# ------------------------------------------------------------
# mtcars$am: 1
# [1] 0.005566092
Same as:
avg_comparisons(fit, variables = "hp", by = "am", transform_pre = "lnratio") |>
print(digits = 7)
#
# Term Contrast am Estimate Std. Error z Pr(>|z|) 2.5 %
# hp mean(+1) 0 0.005364414 0.002701531 1.985694 0.04706726 6.951172e-05
# hp mean(+1) 1 0.005566092 0.001553855 3.582118 < 0.001 2.520592e-03
# 97.5 %
# 0.010659317
# 0.008611592
#
# Prediction type: response
# Columns: type, term, contrast, am, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
See the list of transformation functions here: https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations
The only thing is that by
applies the function within stratas.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论