2023年3月15日 21:35:48go评论112阅读模式

英文:

Different tests for different variables of the same data type in gtsummary

问题

我有一个数据框，希望使用Gtsummary来呈现基线表的p值，对相同数据类型的不同变量使用不同的测试方法。
（例如，对某些分类变量使用Fisher精确检验，对其他变量使用卡方检验。）

例如，

# 创建示例数据
set.seed(123)
mydata <- data.frame(a = sample(c("Yes", "No"), 100, replace = TRUE),
                     b = sample(c("Yes", "No"), 100, replace = TRUE),
                     c = sample(c("Yes", "No"), 100, replace = TRUE),
                     d = sample(c("Low", "Medium", "High"), 100, replace = TRUE),
                     e = sample(c("Group 1", "Group 2", "Group 3"), 100, replace = TRUE),
                     f = sample(c("Male", "Female"), 100, replace = TRUE),
                     g = rnorm(100),
                     h = rnorm(100))

我希望b和c可以通过fisher.test进行测试，d、e和f可以通过chisq.test进行测试（按照a进行分组）。

我尝试了以下代码：

mydata %>%
  tbl_summary(
    by = a,
    statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
                     all_categorical() ~ "{n} / {N} ({p}%)")
  ) %>%
  add_p(all_continuous() ~ 't.test',
        all_categorical(-c('b', 'c')) ~ "chisq.test",
        c('b', 'c') ~ "fisher.test",
        pvalue_fun = function(x) style_number(x, digits = 3))

但这并不起作用。我猜测all_categorical(-c('b', 'c'))这部分有问题，但是否有一种快速从“all_categorical()”中移除特定变量的方法？

更高级的问题是，如何让函数自动检测最佳的测试方法？我发现当数据不符合卡方检验的标准时，add_p不会自动使用Fisher精确检验。

感谢大家的帮助！

英文:

I have a dataframe and hope to present p value of baseline table using Gtsummary, with different tests for different variables of the same data type.
(eg. Use fisher exact test for some categorical variables and chi square tests for others.)

For example,

 # create example data
set.seed(123)
mydata &lt;- data.frame(a = sample(c(&quot;Yes&quot;, &quot;No&quot;), 100, replace = TRUE),
                 b = sample(c(&quot;Yes&quot;, &quot;No&quot;), 100, replace = TRUE),
                 c = sample(c(&quot;Yes&quot;, &quot;No&quot;), 100, replace = TRUE),
                 d = sample(c(&quot;Low&quot;, &quot;Medium&quot;, &quot;High&quot;), 100, replace = TRUE),
                 e = sample(c(&quot;Group 1&quot;, &quot;Group 2&quot;, &quot;Group 3&quot;), 100, replace = TRUE),
                 f = sample(c(&quot;Male&quot;, &quot;Female&quot;), 100, replace = TRUE),
                 g = rnorm(100),
                 h = rnorm(100))`

I hope to that b c can be tested by fisher.test and d e f can be tested by chisq.test. (Divided by a )

I tried:

mydata %&gt;%  
tbl_summary(     
by = a,                                            
statistic = list(all_continuous() ~ &quot;{median} ({p25}-{p75}&quot;,        
                 all_categorical() ~ &quot;{n} / {N} ({p}%)&quot;)
)   %&gt;% add_p(all_continuous() ~ &#39;t.test&#39;,
             all_categorical(-c(&#39;b&#39;,&#39;c&#39;)) ~ &quot;chisq.test&quot;,
             c(&#39;b&#39;, &#39;c&#39;) ~ &quot;fisher.test&quot;, pvalue_fun = function(x) style_number(x, digits =                 3))`

This does not work. I guess there is some thing wrong with the all categorical(-c(‘b’, ‘c’)), but is there a way to rapidly removing certain variables from “all categorical()”?

A more advanced question is, how can I let the function detect which is the best test to use? I found that add_p will not automatically use fisher’s exact test when the data do not meet the standard of chi square.

Thank you all for the kind help!

答案1

得分: 0

你需要在add_p()中具体调用test参数。另外，我对这个语法all_categorical(-c('b','c'))不太确定，所以我在下面进行了更改。

mydata %>%
  tbl_summary(
    by = a,
    statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
                     all_categorical() ~ "{n} / {N} ({p}%)")
  ) %>% add_p(test = list(c("g","h") ~ 't.test',
                          c("d", "e", "f") ~ "chisq.test",
                          c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits = 3))
  )

当我在卡方检验中使用-c('b','c')时，它还包括了该组中的连续变量。上面，我明确列出了每个测试要包括的每个变量。如果您希望它更具动态性，我可以修改代码。

编辑：
首先，您可以识别所有分类变量并将其保存为colchar，然后在第二步中从该列表中删除作为by变量和要使用费舍尔检验的两个变量。然后将colchar_chisq传递给add_p()的测试参数作为变量列表。

colchar <- colnames(mydata)[sapply(mydata, is.character)]
colchar_chisq <- setdiff(colchar, c("a","b","c"))
mydata %>%
  tbl_summary(
    by = a,
    statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
                     all_categorical() ~ "{n} / {N} ({p}%)")
  ) %>% add_p(test = list(all_continuous() ~ 't.test',
                          all_of(colchar_chisq) ~ "chisq.test",
                          c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits = 3))
  )

英文:

You need to specifically call the test argument in add_p(). Also I am not sure about this syntax all_categorical(-c('b','c')) so I changed it below.

    mydata %&gt;%  
  tbl_summary(     
    by = a,                                            
    statistic = list(all_continuous() ~ &quot;{median} ({p25}-{p75}&quot;,        
                     all_categorical() ~ &quot;{n} / {N} ({p}%)&quot;)
  )   %&gt;% add_p(test = list(c(&quot;g&quot;,&quot;h&quot;) ~ &#39;t.test&#39;,
                            c(&quot;d&quot;, &quot;e&quot;, &quot;f&quot;) ~ &quot;chisq.test&quot;,
                            c(&#39;b&#39;, &#39;c&#39;) ~ &quot;fisher.test&quot;, pvalue_fun = function(x) style_number(x, digits =3))
  )

Sorry when I used the -c("b","c") for the chi square test it also included the continuous variables in that group. Above I explicitly mentioned each variable to include for each test. If you need it to be more dynamic I can amend the code.

edit:
First you can identify all categorical variables and save it as colchar and then in a second step remove the variables from that list that are your by variable and two variables you want to use a fisher test for. Then pass colchar_chisq as your list of variables to the add_p() test argument

colchar &lt;- colnames(mydata)[sapply(mydata, is.character)]
colchar_chisq &lt;- setdiff(colchar, c(&quot;a&quot;,&quot;b&quot;,&quot;c&quot;))
mydata %&gt;%  
  tbl_summary(     
    by = a,                                            
    statistic = list(all_continuous() ~ &quot;{median} ({p25}-{p75}&quot;,        
                     all_categorical() ~ &quot;{n} / {N} ({p}%)&quot;)
  )   %&gt;% add_p(test = list(all_continuous() ~ &#39;t.test&#39;,
                            all_of(colchar_chisq) ~ &quot;chisq.test&quot;,
                            c(&#39;b&#39;, &#39;c&#39;) ~ &quot;fisher.test&quot;, pvalue_fun = function(x) style_number(x, digits =3))
  )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在gtsummary中，相同数据类型的不同变量的不同测试。

问题

答案1

重排 R 数据框架（根据特定条件更改为宽格式，重命名和重新排列列）

我安装的软件包文件夹没有显示在文档中。

如何在R中使用ggplot绘制回归线

在使用`.R`脚本的时候，可以使用`knitr::spin()`条件性地显示`.html`上的部分。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。