英文:
Dominance analysis with Dirichlet regression: error related to formula syntax?
问题
目标
我想在Dirichlet回归上运行优势分析,以近似估计一组预测变量的相对重要性(包括缩放的连续预测变量、带样条的连续预测变量和因子)。Dirichlet回归是对Beta回归的扩展,用于建模不是基于计数的比例,并且将比例分为多个类别,参见Douma&weedon(2019)。
建模方法:语法可能很重要
我使用DirichletReg
包来拟合Dirichlet回归,采用“alternative”参数化:这允许同时估计参数和估计的精度。语法为:response ~ parameters | precision
。可以使用不同的预测变量来估计参数和估计精度:response ~ predictor1 + predictor2 | predictor3
。如果未声明,模型将假定固定精度:response ~ predictors
,也可以明确声明为:response ~ predictors | 1
。
我认为错误与公式中的竖线有关,该竖线将用于估计参数的预测变量与用于估计精度的预测变量分开。
我依赖于performance::r2()
来计算模型质量的度量:Nagelkerke的伪R2。然而,对于实际分析,我考虑使用McFadden或Estrella的伪R2,因为它们似乎适用于在多项响应上运行优势分析,参见Luchman 2014。
障碍
我收到错误消息:"fitstat requires at least two elements"
。
一个可重现的示例
使用DirichletReg
包中提供的数据。响应变量只有两个类别,但无论如何,它会产生与实际分析中相同的错误消息。
library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3
# Assemble data
RS <- ReadingSkills
RS$acc <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
RS$dyslexia <- C(RS$dyslexia, treatment)
# Fit Dirichlet regression
rs2 <- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")
summary(rs2)
#> Call:
#> DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#> "alternative")
#>
#> Standardized Residuals:
#> Min 1Q Median 3Q Max
#> 1 - accuracy -1.5279 -0.7798 -0.343 0.6992 2.4213
#> accuracy -2.4213 -0.6992 0.343 0.7798 1.5279
#>
#> MEAN MODELS:
#> ------------------------------------------------------------------
#> Coefficients for variable no. 1: 1 - accuracy
#> - variable omitted (reference category) -
#> ------------------------------------------------------------------
#> Coefficients for variable no. 2: accuracy
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.22386 0.28087 7.918 2.42e-15 ***
#> dyslexiayes -1.81261 0.29696 -6.104 1.04e-09 ***
#> iq -0.02676 0.06900 -0.388 0.698
#> ------------------------------------------------------------------
#>
#> PRECISION MODEL:
#> ------------------------------------------------------------------
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.71017 0.32697 5.230 1.69e-07 ***
#> dyslexiayes 2.47521 0.55055 4.496 6.93e-06 ***
#> iq 0.04097 0.27537 0.149 0.882
#> ------------------------------------------------------------------
#> Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#> AIC: -110.5, BIC: -99.81
#> Number of Observations: 44
#> Links: Logit (Means) and Log (Precision)
#> Parametrization: alternative
as.numeric(performance::r2(rs2))
#> [1] 0.4590758
# Run dominance analysis: error
# If left undeclared, the model assumes fixed precision: parameters | 1
domir::domin(acc ~ dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq | 1,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")),
consmodel = "| dyslexia + iq"
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1 DirichletReg_0.7-1 Formula_1.2-4
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.38 magrittr_2.0.3 insight_0.19.1
#> [5] lattice_0.20-44 rlang_1.1.0 fastmap_1.1.0 stringr_1.5.0
#> [9] highr_0.9 tools_4.1.0 grid_4.1.0 xfun_0.30
#> [13] cli_3.6.0 withr_2.5.0 htmltools_0.5.2 maxLik_1.5-2
#> [17] miscTools_0.6-28 yaml_2.3.5 digest_0.6.29 lifecycle_1.0.3
#> [21] vctrs_0.6.1 fs_1.5.2 glue_1.6.2 evaluate_0.15
#> [25] rmarkdown_2.13 sandwich_3.0-1 reprex_2.0.1 stringi_1.7.6
#> [29] compiler_4.1.0 generics_0.1.2 zoo_1.8-9
<sup>由reprex包(v2.0.1)于2023-07-27创建</sup>
参考文献
编辑(2023年7月31日)
非常感谢Joseph Luchman提供的解决方案!我修改了函数,避免使用管道,并在paste0()
调用中添加了精度公式。不幸的是,reprex::reprex()
无法呈现结果,因此我在下面粘贴了一个屏幕截图。
domir::domir(acc ~ dyslexia + iq, function(y) {
iv <- attr(terms(y), "term.labels")
fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), "| dyslexia + iq", collapse = "")
print(fml)
performance::r2( DirichReg(as.formula(fml), data = RS, model = "alternative") )[[1]]})
英文:
The goal
I want to run dominance analysis on a Dirichlet regression, to approximate the relative importance of a set of predictors (scaled continuous predictors, continuous predictors with splines, and factors). Dirichlet regression is an extension of beta regression to model proportions that are not derived from counts, and that are split between more than 2 categories, see Douma&weedon (2019).
The modelling approach: the syntax is potentially important
I am using the DirichletReg
package to fit a Dirichlet regression, with an "alternative"
parametrization: this allows to estimate simultaneously the parameters and the precision of the estimation. The syntax is: response ~ parameters | precision
. The estimation of parameters can be done with different predictors from those used to estimate precision: response ~ predictor1 + predictor2 | predictor3
. If left undeclared, the model assumes fixed precision: response ~ predictors
, which can be declared explicilty as: response ~ predictors | 1
.
I think that the error is related to the vertical bar in the formula, which separates the predictors used to estimate parameters from the predictors used to estimate precision.
I rely on performance::r2()
to calculate a metric of model quality: Nagelkerke's pseudo-R2. However, for the actual analysis, I am thinking of either McFadden or Estrella's pseudo-R2, as they appear suitable to run dominance analysis on multinomial responses, see Luchman 2014.
The obstacle
I get the error message: "fitstat requires at least two elements".
A reproducible example
From data available in the DirichletReg
package. The response is only two categories, but in any case, it yields the same error message as in the actual analysis.
library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3
# Assemble data
RS <- ReadingSkills
RS$acc <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
RS$dyslexia <- C(RS$dyslexia, treatment)
# Fit Dirichlet regression
rs2 <- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")
summary(rs2)
#> Call:
#> DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#> "alternative")
#>
#> Standardized Residuals:
#> Min 1Q Median 3Q Max
#> 1 - accuracy -1.5279 -0.7798 -0.343 0.6992 2.4213
#> accuracy -2.4213 -0.6992 0.343 0.7798 1.5279
#>
#> MEAN MODELS:
#> ------------------------------------------------------------------
#> Coefficients for variable no. 1: 1 - accuracy
#> - variable omitted (reference category) -
#> ------------------------------------------------------------------
#> Coefficients for variable no. 2: accuracy
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.22386 0.28087 7.918 2.42e-15 ***
#> dyslexiayes -1.81261 0.29696 -6.104 1.04e-09 ***
#> iq -0.02676 0.06900 -0.388 0.698
#> ------------------------------------------------------------------
#>
#> PRECISION MODEL:
#> ------------------------------------------------------------------
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.71017 0.32697 5.230 1.69e-07 ***
#> dyslexiayes 2.47521 0.55055 4.496 6.93e-06 ***
#> iq 0.04097 0.27537 0.149 0.882
#> ------------------------------------------------------------------
#> Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#> AIC: -110.5, BIC: -99.81
#> Number of Observations: 44
#> Links: Logit (Means) and Log (Precision)
#> Parametrization: alternative
as.numeric(performance::r2(rs2))
#> [1] 0.4590758
# Run dominance analysis: error
# If left undeclared, the model assumes fixed precision: parameters | 1
domir::domin(acc ~ dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq | 1,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")),
consmodel = "| dyslexia + iq"
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1 DirichletReg_0.7-1 Formula_1.2-4
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.38 magrittr_2.0.3 insight_0.19.1
#> [5] lattice_0.20-44 rlang_1.1.0 fastmap_1.1.0 stringr_1.5.0
#> [9] highr_0.9 tools_4.1.0 grid_4.1.0 xfun_0.30
#> [13] cli_3.6.0 withr_2.5.0 htmltools_0.5.2 maxLik_1.5-2
#> [17] miscTools_0.6-28 yaml_2.3.5 digest_0.6.29 lifecycle_1.0.3
#> [21] vctrs_0.6.1 fs_1.5.2 glue_1.6.2 evaluate_0.15
#> [25] rmarkdown_2.13 sandwich_3.0-1 reprex_2.0.1 stringi_1.7.6
#> [29] compiler_4.1.0 generics_0.1.2 zoo_1.8-9
<sup>Created on 2023-07-27 by the reprex package (v2.0.1)</sup>
References
EDIT (31 July 2023)
Many thanks to Joseph Luchman for the solution! I have modified the function to avoid the use of pipes, and I have added the formula for precision inside the paste0()
call. Unfortunately, reprex::reprex()
won't render the result, so I paste an screenshot below.
domir::domir(acc ~ dyslexia + iq, function(y) {
iv <- attr(terms(y), "term.labels")
fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), "| dyslexia + iq", collapse = "")
print(fml)
performance::r2( DirichReg(as.formula(fml), data = RS, model = "alternative") )[[1]]})
答案1
得分: 1
这里提到的问题是由domin
中的list
提示的,提交给fitstat
的list
长度为1。
将括号移动到正确的位置可以解决这个问题,但同时揭示了另一个问题,我认为这与DirichletReg::DirichReg
的设计有关。
基本上,似乎DirichletReg::DirichReg
无法接受懒惰求值的formula
,而使用domin
时需要这样的formula
。
例如,大多数具有formula
的建模函数允许像这样的用法:
lapply(list(mpg ~ am, mpg ~ vs), lm, data = datasets::mtcars)
正如你在输出的Call部分中看到的那样,lm
以灵活的方式接受参数,并在需要时对formula
进行求值,应用于数据。
但是,当尝试使用DirichReg
时,使用焦点模型结果的部分会出现错误:
lapply(list(acc ~ dyslexia, acc ~ iq), DirichReg, data = RS, model = "alternative")
实际上,DirichReg
需要将formula
作为字符串进行解析(它使用match.call
来解析要处理的参数;至少我认为这是问题所在)。
解决这个问题稍微复杂一些。必须动态地将domin
的formula
(或在下面的示例中使用更新的domir::domir
)提交给每个函数调用,以重构一个字符串formula
,然后在提交给DirichReg
时使用as.formula
进行解释。生成的公式也会被打印出来。
domir(acc ~ dyslexia + iq, function(y) {
iv <- terms(y) |> attr("term.labels")
fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), collapse = "")
print(fml)
DirichReg(as.formula(fml), data = RS, model = "alternative") |> performance::r2() |> _[[1]]})
总体值:0.6568343
一般优势值:
一般优势 标准化排名
dyslexia 0.4983012 0.7586406 1
iq 0.1585332 0.2413594 2
条件优势值:
子集大小:1 子集大小:2
dyslexia 0.6498178 0.346784532
iq 0.3100498 0.007016514
完全优势指定:
Dmnated?dyslexia Dmnated?iq
Dmnates?dyslexia NA TRUE
Dmnates?iq FALSE NA
英文:
The issue asked about here is hinted at by domin
as the list
submitted to fitstat
is length 1.
> list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
[[1]]
\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")
Moving the parentheses fixes it but reveals another that, I believe, is related to the design on DirichletReg::DirichReg
.
> domir::domin(acc ~ dyslexia + iq,
+ reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
+ fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x))), "r2.nagelkerke")
+ )
Error in x$formula : object of type 'symbol' is not subsettable
Basically, it seems that DirichletReg::DirichReg
cannot accept a lazily evaluated formula
that is required for using domin
.
For instance, most modeling functions with a formula
allow for something like:
> lapply(list(mpg ~ am, mpg ~ vs), lm, data = datasets::mtcars)
[[1]]
Call:
FUN(formula = X[[i]], data = ..1)
Coefficients:
(Intercept) am
17.147 7.245
[[2]]
Call:
FUN(formula = X[[i]], data = ..1)
Coefficients:
(Intercept) vs
16.62 7.94
As you can see in the Call portion of the output, lm
accepts arguments in a flexible way and evaluates the formula when it needs to, as applied to the data.
When trying something similar with DirichReg
using parts of the focal model results in:
> lapply(list(acc ~ dyslexia, acc ~ iq), DirichReg, data = RS, model = "alternative")
Error in eval(x) : object 'X' not found
DirichReg
actually needs to 'see' the formula as a string (as it uses match.call
to parse arguments for processing; at least I believe this is the issue).
The resolution to this one is a little more complex. Have to, on the fly, take the formula domin
(or in the case below, I use the more updated domir::domir
; also note I'm using R v4.3 to allow the element selection with the base R pipe) submits to each function call to reconstruct a string formula that is then interpreted as.formula
when submitted to DirichReg
in the example below. The formulas produced are also printed.
> domir(acc ~ dyslexia + iq, function(y) {
+ iv <- terms(y) |> attr("term.labels")
+ fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), collapse = "")
+ print(fml)
+ DirichReg(as.formula(fml), data = RS, model = "alternative") |> performance::r2() |> _[[1]]})
[1] "acc ~ dyslexia+iq"
[1] "acc ~ dyslexia"
[1] "acc ~ iq"
Overall Value: 0.6568343
General Dominance Values:
General Dominance Standardized Ranks
dyslexia 0.4983012 0.7586406 1
iq 0.1585332 0.2413594 2
Conditional Dominance Values:
Subset Size: 1 Subset Size: 2
dyslexia 0.6498178 0.346784532
iq 0.3100498 0.007016514
Complete Dominance Designations:
Dmnated?dyslexia Dmnated?iq
Dmnates?dyslexia NA TRUE
Dmnates?iq FALSE NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论