Dominance analysis with Dirichlet regression: 公式语法相关的错误吗?

huangapple go评论61阅读模式
英文:

Dominance analysis with Dirichlet regression: error related to formula syntax?

问题

目标

我想在Dirichlet回归上运行优势分析,以近似估计一组预测变量的相对重要性(包括缩放的连续预测变量、带样条的连续预测变量和因子)。Dirichlet回归是对Beta回归的扩展,用于建模不是基于计数的比例,并且将比例分为多个类别,参见Douma&weedon(2019)。

建模方法:语法可能很重要

我使用DirichletReg包来拟合Dirichlet回归,采用“alternative”参数化:这允许同时估计参数和估计的精度。语法为:response ~ parameters | precision。可以使用不同的预测变量来估计参数和估计精度:response ~ predictor1 + predictor2 | predictor3。如果未声明,模型将假定固定精度:response ~ predictors,也可以明确声明为:response ~ predictors | 1

我认为错误与公式中的竖线有关,该竖线将用于估计参数的预测变量与用于估计精度的预测变量分开。

我依赖于performance::r2()来计算模型质量的度量:Nagelkerke的伪R2。然而,对于实际分析,我考虑使用McFadden或Estrella的伪R2,因为它们似乎适用于在多项响应上运行优势分析,参见Luchman 2014。

障碍

我收到错误消息:"fitstat requires at least two elements"

一个可重现的示例

使用DirichletReg包中提供的数据。响应变量只有两个类别,但无论如何,它会产生与实际分析中相同的错误消息。

library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3

# Assemble data
RS <- ReadingSkills
RS$acc <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
RS$dyslexia <- C(RS$dyslexia, treatment)

# Fit Dirichlet regression
rs2 <- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")

summary(rs2)
#> Call:
#> DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#> "alternative")
#> 
#> Standardized Residuals:
#>                   Min       1Q  Median      3Q     Max
#> 1 - accuracy  -1.5279  -0.7798  -0.343  0.6992  2.4213
#> accuracy      -2.4213  -0.6992   0.343  0.7798  1.5279
#> 
#> MEAN MODELS:
#> ------------------------------------------------------------------
#> Coefficients for variable no. 1: 1 - accuracy
#> - variable omitted (reference category) -
#> ------------------------------------------------------------------
#> Coefficients for variable no. 2: accuracy
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.22386    0.28087   7.918 2.42e-15 ***
#> dyslexiayes -1.81261    0.29696  -6.104 1.04e-09 ***
#> iq          -0.02676    0.06900  -0.388    0.698    
#> ------------------------------------------------------------------
#> 
#> PRECISION MODEL:
#> ------------------------------------------------------------------
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  1.71017    0.32697   5.230 1.69e-07 ***
#> dyslexiayes  2.47521    0.55055   4.496 6.93e-06 ***
#> iq           0.04097    0.27537   0.149    0.882    
#> ------------------------------------------------------------------
#> Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#> AIC: -110.5, BIC: -99.81
#> Number of Observations: 44
#> Links: Logit (Means) and Log (Precision)
#> Parametrization: alternative
as.numeric(performance::r2(rs2))
#> [1] 0.4590758

# Run dominance analysis: error

# If left undeclared, the model assumes fixed precision: parameters |  1
domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | 1,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
             )
#> Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
             )
#> Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")),
             consmodel = "| dyslexia + iq"
             )
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1        DirichletReg_0.7-1 Formula_1.2-4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13  knitr_1.38       magrittr_2.0.3   insight_0.19.1  
#>  [5] lattice_0.20-44  rlang_1.1.0      fastmap_1.1.0    stringr_1.5.0   
#>  [9] highr_0.9        tools_4.1.0      grid_4.1.0       xfun_0.30       
#> [13] cli_3.6.0        withr_2.5.0      htmltools_0.5.2  maxLik_1.5-2    
#> [17] miscTools_0.6-28 yaml_2.3.5       digest_0.6.29    lifecycle_1.0.3 
#> [21] vctrs_0.6.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
#> [25] rmarkdown_2.13   sandwich_3.0-1   reprex_2.0.1     stringi_1.7.6   
#> [29] compiler_4.1.0   generics_0.1.2   zoo_1.8-9

<sup>由reprex包(v2.0.1)于2023-07-27创建</sup>

参考文献

Luchman Relative Importance Analysis With Multicategory Dependent Variables:: An Extension and Review of Best Practices (2014) Organizational research methods

Douma & Weedon. Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression (2019) Methods in Ecology and Evolution

编辑(2023年7月31日)

非常感谢Joseph Luchman提供的解决方案!我修改了函数,避免使用管道,并在paste0()调用中添加了精度公式。不幸的是,reprex::reprex()无法呈现结果,因此我在下面粘贴了一个屏幕截图。

domir::domir(acc ~ dyslexia + iq, function(y)  {
  iv &lt;- attr(terms(y), &quot;term.labels&quot;)
  fml &lt;- paste0(&quot;acc ~ &quot;, paste0(iv, collapse = &quot;+&quot;), &quot;| dyslexia + iq&quot;, collapse = &quot;&quot;)
  print(fml)
  performance::r2( DirichReg(as.formula(fml), data = RS, model = &quot;alternative&quot;) )[[1]]})

Dominance analysis with Dirichlet regression: 公式语法相关的错误吗?

英文:

The goal

I want to run dominance analysis on a Dirichlet regression, to approximate the relative importance of a set of predictors (scaled continuous predictors, continuous predictors with splines, and factors). Dirichlet regression is an extension of beta regression to model proportions that are not derived from counts, and that are split between more than 2 categories, see Douma&weedon (2019).

The modelling approach: the syntax is potentially important

I am using the DirichletReg package to fit a Dirichlet regression, with an &quot;alternative&quot; parametrization: this allows to estimate simultaneously the parameters and the precision of the estimation. The syntax is: response ~ parameters | precision. The estimation of parameters can be done with different predictors from those used to estimate precision: response ~ predictor1 + predictor2 | predictor3. If left undeclared, the model assumes fixed precision: response ~ predictors, which can be declared explicilty as: response ~ predictors | 1.

I think that the error is related to the vertical bar in the formula, which separates the predictors used to estimate parameters from the predictors used to estimate precision.

I rely on performance::r2() to calculate a metric of model quality: Nagelkerke's pseudo-R2. However, for the actual analysis, I am thinking of either McFadden or Estrella's pseudo-R2, as they appear suitable to run dominance analysis on multinomial responses, see Luchman 2014.

The obstacle

I get the error message: &quot;fitstat requires at least two elements&quot;.

A reproducible example

From data available in the DirichletReg package. The response is only two categories, but in any case, it yields the same error message as in the actual analysis.

library(DirichletReg)
#&gt; Warning: package &#39;DirichletReg&#39; was built under R version 4.1.3
#&gt; Loading required package: Formula
#&gt; Warning: package &#39;Formula&#39; was built under R version 4.1.1
library(domir)
library(performance)
#&gt; Warning: package &#39;performance&#39; was built under R version 4.1.3

# Assemble data
RS &lt;- ReadingSkills
RS$acc &lt;- DR_data(RS$accuracy)
#&gt; only one variable in [0, 1] supplied - beta-distribution assumed.
#&gt; check this assumption.
RS$dyslexia &lt;- C(RS$dyslexia, treatment)

# Fit Dirichlet regression
rs2 &lt;- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = &quot;alternative&quot;)

summary(rs2)
#&gt; Call:
#&gt; DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#&gt; &quot;alternative&quot;)
#&gt; 
#&gt; Standardized Residuals:
#&gt;                   Min       1Q  Median      3Q     Max
#&gt; 1 - accuracy  -1.5279  -0.7798  -0.343  0.6992  2.4213
#&gt; accuracy      -2.4213  -0.6992   0.343  0.7798  1.5279
#&gt; 
#&gt; MEAN MODELS:
#&gt; ------------------------------------------------------------------
#&gt; Coefficients for variable no. 1: 1 - accuracy
#&gt; - variable omitted (reference category) -
#&gt; ------------------------------------------------------------------
#&gt; Coefficients for variable no. 2: accuracy
#&gt;             Estimate Std. Error z value Pr(&gt;|z|)    
#&gt; (Intercept)  2.22386    0.28087   7.918 2.42e-15 ***
#&gt; dyslexiayes -1.81261    0.29696  -6.104 1.04e-09 ***
#&gt; iq          -0.02676    0.06900  -0.388    0.698    
#&gt; ------------------------------------------------------------------
#&gt; 
#&gt; PRECISION MODEL:
#&gt; ------------------------------------------------------------------
#&gt;             Estimate Std. Error z value Pr(&gt;|z|)    
#&gt; (Intercept)  1.71017    0.32697   5.230 1.69e-07 ***
#&gt; dyslexiayes  2.47521    0.55055   4.496 6.93e-06 ***
#&gt; iq           0.04097    0.27537   0.149    0.882    
#&gt; ------------------------------------------------------------------
#&gt; Significance codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#&gt; AIC: -110.5, BIC: -99.81
#&gt; Number of Observations: 44
#&gt; Links: Logit (Means) and Log (Precision)
#&gt; Parametrization: alternative
as.numeric(performance::r2(rs2))
#&gt; [1] 0.4590758

# Run dominance analysis: error

# If left undeclared, the model assumes fixed precision: parameters |  1
domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = &quot;alternative&quot;),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), &quot;r2.nagelkerke&quot;))
)
#&gt; Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | 1,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = &quot;alternative&quot;),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), &quot;r2.nagelkerke&quot;))
             )
#&gt; Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = &quot;alternative&quot;),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), &quot;r2.nagelkerke&quot;))
             )
#&gt; Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = &quot;alternative&quot;),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), &quot;r2.nagelkerke&quot;)),
             consmodel = &quot;| dyslexia + iq&quot;
             )
#&gt; Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

sessionInfo()
#&gt; R version 4.1.0 (2021-05-18)
#&gt; Platform: x86_64-w64-mingw32/x64 (64-bit)
#&gt; Running under: Windows 10 x64 (build 19045)
#&gt; 
#&gt; Matrix products: default
#&gt; 
#&gt; locale:
#&gt; [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#&gt; [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#&gt; [5] LC_TIME=Spanish_Spain.1252    
#&gt; 
#&gt; attached base packages:
#&gt; [1] stats     graphics  grDevices utils     datasets  methods   base     
#&gt; 
#&gt; other attached packages:
#&gt; [1] performance_0.10.0 domir_1.0.1        DirichletReg_0.7-1 Formula_1.2-4     
#&gt; 
#&gt; loaded via a namespace (and not attached):
#&gt;  [1] rstudioapi_0.13  knitr_1.38       magrittr_2.0.3   insight_0.19.1  
#&gt;  [5] lattice_0.20-44  rlang_1.1.0      fastmap_1.1.0    stringr_1.5.0   
#&gt;  [9] highr_0.9        tools_4.1.0      grid_4.1.0       xfun_0.30       
#&gt; [13] cli_3.6.0        withr_2.5.0      htmltools_0.5.2  maxLik_1.5-2    
#&gt; [17] miscTools_0.6-28 yaml_2.3.5       digest_0.6.29    lifecycle_1.0.3 
#&gt; [21] vctrs_0.6.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
#&gt; [25] rmarkdown_2.13   sandwich_3.0-1   reprex_2.0.1     stringi_1.7.6   
#&gt; [29] compiler_4.1.0   generics_0.1.2   zoo_1.8-9

<sup>Created on 2023-07-27 by the reprex package (v2.0.1)</sup>

References

Luchman Relative Importance Analysis With Multicategory Dependent Variables:: An Extension and Review of Best Practices (2014) Organizational research methods

Douma & Weedon. Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression (2019) Methods in Ecology and Evolution

EDIT (31 July 2023)

Many thanks to Joseph Luchman for the solution! I have modified the function to avoid the use of pipes, and I have added the formula for precision inside the paste0() call. Unfortunately, reprex::reprex() won't render the result, so I paste an screenshot below.

domir::domir(acc ~ dyslexia + iq, function(y)  {
iv &lt;- attr(terms(y), &quot;term.labels&quot;)
fml &lt;- paste0(&quot;acc ~ &quot;, paste0(iv, collapse = &quot;+&quot;), &quot;| dyslexia + iq&quot;, collapse = &quot;&quot;)
print(fml)
performance::r2( DirichReg(as.formula(fml), data = RS, model = &quot;alternative&quot;) )[[1]]})

Dominance analysis with Dirichlet regression: 公式语法相关的错误吗?

答案1

得分: 1

这里提到的问题是由domin中的list提示的,提交给fitstatlist长度为1。

将括号移动到正确的位置可以解决这个问题,但同时揭示了另一个问题,我认为这与DirichletReg::DirichReg的设计有关。

基本上,似乎DirichletReg::DirichReg无法接受懒惰求值的formula,而使用domin时需要这样的formula

例如,大多数具有formula的建模函数允许像这样的用法:

lapply(list(mpg ~ am, mpg ~ vs), lm, data = datasets::mtcars)

正如你在输出的Call部分中看到的那样,lm以灵活的方式接受参数,并在需要时对formula进行求值,应用于数据。

但是,当尝试使用DirichReg时,使用焦点模型结果的部分会出现错误:

lapply(list(acc ~ dyslexia, acc ~ iq), DirichReg, data = RS, model = "alternative")

实际上,DirichReg需要将formula作为字符串进行解析(它使用match.call来解析要处理的参数;至少我认为这是问题所在)。

解决这个问题稍微复杂一些。必须动态地将dominformula(或在下面的示例中使用更新的domir::domir)提交给每个函数调用,以重构一个字符串formula,然后在提交给DirichReg时使用as.formula进行解释。生成的公式也会被打印出来。

domir(acc ~ dyslexia + iq, function(y)  {
    iv <- terms(y) |> attr("term.labels")
    fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), collapse = "")
    print(fml)
    DirichReg(as.formula(fml), data = RS, model = "alternative") |> performance::r2() |> _[[1]]})

总体值:0.6568343

一般优势值:
一般优势 标准化排名
dyslexia 0.4983012 0.7586406 1
iq 0.1585332 0.2413594 2

条件优势值:
子集大小:1 子集大小:2
dyslexia 0.6498178 0.346784532
iq 0.3100498 0.007016514

完全优势指定:
Dmnated?dyslexia Dmnated?iq
Dmnates?dyslexia NA TRUE
Dmnates?iq FALSE NA

英文:

The issue asked about here is hinted at by domin as the list submitted to fitstat is length 1.

&gt; list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), &quot;r2.nagelkerke&quot;))
[[1]]
\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), &quot;r2.nagelkerke&quot;)

Moving the parentheses fixes it but reveals another that, I believe, is related to the design on DirichletReg::DirichReg.

&gt; domir::domin(acc ~ dyslexia + iq,
+              reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = &quot;alternative&quot;),
+              fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x))), &quot;r2.nagelkerke&quot;)
+ )
Error in x$formula : object of type &#39;symbol&#39; is not subsettable

Basically, it seems that DirichletReg::DirichReg cannot accept a lazily evaluated formula that is required for using domin.

For instance, most modeling functions with a formula allow for something like:

&gt; lapply(list(mpg ~ am, mpg ~ vs), lm, data = datasets::mtcars)
[[1]]
Call:
FUN(formula = X[[i]], data = ..1)
Coefficients:
(Intercept)           am  
17.147        7.245  
[[2]]
Call:
FUN(formula = X[[i]], data = ..1)
Coefficients:
(Intercept)           vs  
16.62         7.94  

As you can see in the Call portion of the output, lm accepts arguments in a flexible way and evaluates the formula when it needs to, as applied to the data.

When trying something similar with DirichReg using parts of the focal model results in:

&gt; lapply(list(acc ~ dyslexia, acc ~ iq), DirichReg, data = RS, model = &quot;alternative&quot;)
Error in eval(x) : object &#39;X&#39; not found

DirichReg actually needs to 'see' the formula as a string (as it uses match.call to parse arguments for processing; at least I believe this is the issue).

The resolution to this one is a little more complex. Have to, on the fly, take the formula domin (or in the case below, I use the more updated domir::domir; also note I'm using R v4.3 to allow the element selection with the base R pipe) submits to each function call to reconstruct a string formula that is then interpreted as.formula when submitted to DirichReg in the example below. The formulas produced are also printed.

&gt; domir(acc ~ dyslexia + iq, function(y)  {
+     iv &lt;- terms(y) |&gt; attr(&quot;term.labels&quot;)
+     fml &lt;- paste0(&quot;acc ~ &quot;, paste0(iv, collapse = &quot;+&quot;), collapse = &quot;&quot;)
+     print(fml)
+     DirichReg(as.formula(fml), data = RS, model = &quot;alternative&quot;) |&gt; performance::r2() |&gt; _[[1]]})
[1] &quot;acc ~ dyslexia+iq&quot;
[1] &quot;acc ~ dyslexia&quot;
[1] &quot;acc ~ iq&quot;
Overall Value:      0.6568343 
General Dominance Values:
General Dominance Standardized Ranks
dyslexia         0.4983012    0.7586406     1
iq               0.1585332    0.2413594     2
Conditional Dominance Values:
Subset Size: 1 Subset Size: 2
dyslexia      0.6498178    0.346784532
iq            0.3100498    0.007016514
Complete Dominance Designations:
Dmnated?dyslexia Dmnated?iq
Dmnates?dyslexia               NA       TRUE
Dmnates?iq                  FALSE         NA

huangapple
  • 本文由 发表于 2023年7月27日 17:32:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76789117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定