`avg_comparisons`函数中的`by`参数是否计算分层特定的边际效应?

huangapple go评论105阅读模式
英文:

Does the `by` argument in `avg_comparisons` compute the strata specific marginal effect?

问题

我正在分析我们刚刚完成的AB测试的数据。我们的结果是二元的,y,并且我们已经按第三个变量 g 进行了分层结果。

因为干预可能因 g 而异,我已经使用鲁棒协方差估计拟合了泊松回归,如下所示:

  1. library(tidyverse)
  2. library(sandwich)
  3. library(marginaleffects)
  4. fit <- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))

从这里,我想要了解各层特定的因果风险比(通常在行业中称为 "lift")。我的方法是使用 avg_comparisons 如下所示:

  1. avg_comparisons(fit,
  2. variables = 'treatment',
  3. newdata = model_data,
  4. transform_pre = 'lnratioavg',
  5. transform_post = exp,
  6. by=c('g'),
  7. vcov = 'HC')

结果似乎与当我按 g 中的组筛选数据时计算的 "lift" 一致。

问题

通过传递 by=c('g'),我是否实际上正在计算我怀疑的各层特定的风险比?是否有任何隐藏的 "陷阱" 或我未考虑到的事情?

如果需要,我可以提供数据和一个最小的工作示例。

英文:

I'm analyzing data from an AB test we just finished running. Our outcome is binary, y, and we have stratified results by a third variable, g.

Because the intervention could vary by g, I've fit a Poisson regression with robust covariance estimation as follows

  1. library(tidyverse)
  2. library(sandwich)
  3. library(marginaleffects)
  4. fit &lt;- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))

From here, I'd like to know the strata specific causal risk ration (which we usually call "lift" in industry). My approach is to use avg_comparisons as follows

  1. avg_comparisons(fit,
  2. variables = &#39;treatment&#39;,
  3. newdata = model_data,
  4. transform_pre = &#39;lnratioavg&#39;,
  5. transform_post = exp,
  6. by=c(&#39;g&#39;),
  7. vcov = &#39;HC&#39;)

The result seems to be consistent with calculations of the lift when I filter the data by groups in g.

Question

By passing by=c(&#39;g&#39;), am I actually calculating the strata specific risk ratios as I suspect? Is there any hidden "gotchas" or things I have failed to consider?

I can provide data and a minimal working example if need be.

答案1

得分: 1

以下是翻译的代码部分:

  1. 这是一个非常简单的基本`R`示例,展示了底层发生的情况:
  2. library(marginaleffects)
  3. fit <- glm(carb ~ hp * am, data = mtcars, family = poisson)

与更改hp的值关联的对数比率的单元水平估计:

  1. cmp <- comparisons(fit, variables = "hp", transform_pre = "lnratio")
  2. cmp
  3. #
  4. # Term Contrast Estimate Std. Error z Pr(&gt;|z|) 2.5 % 97.5 %
  5. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  6. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  7. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  8. # hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
  9. # hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
  10. # --- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects ---
  11. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  12. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  13. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  14. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  15. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  16. # Prediction type: response
  17. # Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, carb, hp, am

这等同于:

  1. # 预测网格,差异为1单位
  2. lo <- transform(mtcars, hp = hp - .5)
  3. hi <- transform(mtcars, hp = hp + .5)
  4. # 响应比例上的预测
  5. y_lo <- predict(fit, newdata = lo, type = "response")
  6. y_hi <- predict(fit, newdata = hi, type = "response")
  7. # 对数比率
  8. lnratio <- log(y_hi / y_lo)
  9. # 等同于`comparisons()`
  10. all(cmp$estimate == lnratio)
  11. # [1] TRUE

现在我们取各层的均值,并在log()内部:

  1. by(data.frame(am = lo$am, y_lo, y_hi),
  2. mtcars$am,
  3. FUN = \(x) log(mean(x$y_hi) / mean(x$y_lo)))
  4. # mtcars$am: 0
  5. # [1] 0.005364414
  6. # ------------------------------------------------------------
  7. # mtcars$am: 1
  8. # [1] 0.005566092

与之相同:

  1. avg_comparisons(fit, variables = "hp", by = "am", transform_pre = "lnratio") |&gt;
  2. print(digits = 7)
  3. #
  4. # Term Contrast am Estimate Std. Error z Pr(&gt;|z|) 2.5 %
  5. # hp mean(+1) 0 0.005364414 0.002701531 1.985694 0.04706726 6.951172e-05
  6. # hp mean(+1) 1 0.005566092 0.001553855 3.582118 &lt; 0.001 2.520592e-03
  7. # 97.5 %
  8. # 0.010659317
  9. # 0.008611592
  10. #
  11. # Prediction type: response
  12. # Columns: type, term, contrast, am, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

在此处查看转换函数的列表:https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations

唯一的不同之处在于by在层内应用该函数。

英文:

Here’s a very simple base R example to show what is happening under-the-hood:

  1. library(marginaleffects)
  2. fit &lt;- glm(carb ~ hp * am, data = mtcars, family = poisson)

Unit level estimates of log ratio associated with a change of 1 in hp:

  1. cmp &lt;- comparisons(fit, variables = &quot;hp&quot;, transform_pre = &quot;lnratio&quot;)
  2. cmp
  3. #
  4. # Term Contrast Estimate Std. Error z Pr(&gt;|z|) 2.5 % 97.5 %
  5. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  6. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  7. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  8. # hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
  9. # hp +1 0.0054 0.0027 2.0 0.047 0.00007 0.0107
  10. # --- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects ---
  11. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  12. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  13. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  14. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  15. # hp +1 0.0056 0.0016 3.6 &lt;0.001 0.00252 0.0086
  16. # Prediction type: response
  17. # Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, carb, hp, am

This is equivalent to:

  1. # prediction grids with 1 unit difference
  2. lo &lt;- transform(mtcars, hp = hp - .5)
  3. hi &lt;- transform(mtcars, hp = hp + .5)
  4. # predictions on response scale
  5. y_lo &lt;- predict(fit, newdata = lo, type = &quot;response&quot;)
  6. y_hi &lt;- predict(fit, newdata = hi, type = &quot;response&quot;)
  7. # log ratio
  8. lnratio &lt;- log(y_hi / y_lo)
  9. # equivalent to `comparisons()`
  10. all(cmp$estimate == lnratio)
  11. # [1] TRUE

Now we take the strata specific means, with mean() inside log():

  1. by(data.frame(am = lo$am, y_lo, y_hi),
  2. mtcars$am,
  3. FUN = \(x) log(mean(x$y_hi) / mean(x$y_lo)))
  4. # mtcars$am: 0
  5. # [1] 0.005364414
  6. # ------------------------------------------------------------
  7. # mtcars$am: 1
  8. # [1] 0.005566092

Same as:

  1. avg_comparisons(fit, variables = &quot;hp&quot;, by = &quot;am&quot;, transform_pre = &quot;lnratio&quot;) |&gt;
  2. print(digits = 7)
  3. #
  4. # Term Contrast am Estimate Std. Error z Pr(&gt;|z|) 2.5 %
  5. # hp mean(+1) 0 0.005364414 0.002701531 1.985694 0.04706726 6.951172e-05
  6. # hp mean(+1) 1 0.005566092 0.001553855 3.582118 &lt; 0.001 2.520592e-03
  7. # 97.5 %
  8. # 0.010659317
  9. # 0.008611592
  10. #
  11. # Prediction type: response
  12. # Columns: type, term, contrast, am, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

See the list of transformation functions here: https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations

The only thing is that by applies the function within stratas.

huangapple
  • 本文由 发表于 2023年2月19日 01:30:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75495116.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定