英文:
Different tests for different variables of the same data type in gtsummary
问题
我有一个数据框,希望使用Gtsummary来呈现基线表的p值,对相同数据类型的不同变量使用不同的测试方法。
(例如,对某些分类变量使用Fisher精确检验,对其他变量使用卡方检验。)
例如,
# 创建示例数据
set.seed(123)
mydata <- data.frame(a = sample(c("Yes", "No"), 100, replace = TRUE),
b = sample(c("Yes", "No"), 100, replace = TRUE),
c = sample(c("Yes", "No"), 100, replace = TRUE),
d = sample(c("Low", "Medium", "High"), 100, replace = TRUE),
e = sample(c("Group 1", "Group 2", "Group 3"), 100, replace = TRUE),
f = sample(c("Male", "Female"), 100, replace = TRUE),
g = rnorm(100),
h = rnorm(100))
我希望b和c可以通过fisher.test进行测试,d、e和f可以通过chisq.test进行测试(按照a进行分组)。
我尝试了以下代码:
mydata %>%
tbl_summary(
by = a,
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
all_categorical() ~ "{n} / {N} ({p}%)")
) %>%
add_p(all_continuous() ~ 't.test',
all_categorical(-c('b', 'c')) ~ "chisq.test",
c('b', 'c') ~ "fisher.test",
pvalue_fun = function(x) style_number(x, digits = 3))
但这并不起作用。我猜测all_categorical(-c('b', 'c'))
这部分有问题,但是否有一种快速从“all_categorical()”中移除特定变量的方法?
更高级的问题是,如何让函数自动检测最佳的测试方法?我发现当数据不符合卡方检验的标准时,add_p不会自动使用Fisher精确检验。
感谢大家的帮助!
英文:
I have a dataframe and hope to present p value of baseline table using Gtsummary, with different tests for different variables of the same data type.
(eg. Use fisher exact test for some categorical variables and chi square tests for others.)
For example,
# create example data
set.seed(123)
mydata <- data.frame(a = sample(c("Yes", "No"), 100, replace = TRUE),
b = sample(c("Yes", "No"), 100, replace = TRUE),
c = sample(c("Yes", "No"), 100, replace = TRUE),
d = sample(c("Low", "Medium", "High"), 100, replace = TRUE),
e = sample(c("Group 1", "Group 2", "Group 3"), 100, replace = TRUE),
f = sample(c("Male", "Female"), 100, replace = TRUE),
g = rnorm(100),
h = rnorm(100))`
I hope to that b c can be tested by fisher.test and d e f can be tested by chisq.test. (Divided by a )
I tried:
mydata %>%
tbl_summary(
by = a,
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
all_categorical() ~ "{n} / {N} ({p}%)")
) %>% add_p(all_continuous() ~ 't.test',
all_categorical(-c('b','c')) ~ "chisq.test",
c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits = 3))`
This does not work. I guess there is some thing wrong with the all categorical(-c(‘b’, ‘c’))
, but is there a way to rapidly removing certain variables from “all categorical()”
?
A more advanced question is, how can I let the function detect which is the best test to use? I found that add_p will not automatically use fisher’s exact test when the data do not meet the standard of chi square.
Thank you all for the kind help!
答案1
得分: 0
你需要在add_p()
中具体调用test
参数。另外,我对这个语法all_categorical(-c('b','c'))
不太确定,所以我在下面进行了更改。
mydata %>%
tbl_summary(
by = a,
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
all_categorical() ~ "{n} / {N} ({p}%)")
) %>% add_p(test = list(c("g","h") ~ 't.test',
c("d", "e", "f") ~ "chisq.test",
c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits = 3))
)
当我在卡方检验中使用-c('b','c')
时,它还包括了该组中的连续变量。上面,我明确列出了每个测试要包括的每个变量。如果您希望它更具动态性,我可以修改代码。
编辑:
首先,您可以识别所有分类变量并将其保存为colchar
,然后在第二步中从该列表中删除作为by
变量和要使用费舍尔检验的两个变量。然后将colchar_chisq
传递给add_p()
的测试参数作为变量列表。
colchar <- colnames(mydata)[sapply(mydata, is.character)]
colchar_chisq <- setdiff(colchar, c("a","b","c"))
mydata %>%
tbl_summary(
by = a,
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
all_categorical() ~ "{n} / {N} ({p}%)")
) %>% add_p(test = list(all_continuous() ~ 't.test',
all_of(colchar_chisq) ~ "chisq.test",
c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits = 3))
)
英文:
You need to specifically call the test
argument in add_p()
. Also I am not sure about this syntax all_categorical(-c('b','c'))
so I changed it below.
mydata %>%
tbl_summary(
by = a,
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
all_categorical() ~ "{n} / {N} ({p}%)")
) %>% add_p(test = list(c("g","h") ~ 't.test',
c("d", "e", "f") ~ "chisq.test",
c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits =3))
)
Sorry when I used the -c("b","c")
for the chi square test it also included the continuous variables in that group. Above I explicitly mentioned each variable to include for each test. If you need it to be more dynamic I can amend the code.
edit:
First you can identify all categorical variables and save it as colchar
and then in a second step remove the variables from that list that are your by variable and two variables you want to use a fisher test for. Then pass colchar_chisq
as your list of variables to the add_p()
test argument
colchar <- colnames(mydata)[sapply(mydata, is.character)]
colchar_chisq <- setdiff(colchar, c("a","b","c"))
mydata %>%
tbl_summary(
by = a,
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",
all_categorical() ~ "{n} / {N} ({p}%)")
) %>% add_p(test = list(all_continuous() ~ 't.test',
all_of(colchar_chisq) ~ "chisq.test",
c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits =3))
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论