英文:
Fitting several zero-inflated negbin models and get pooled results
问题
我有一个包含患者数据的数据集。其中一些患者在医院里住过多次,但为了保证观察相互独立(我尝试过使用多层模型,但这个数据不适用),我创建了一个数据集,对于每个有多次住院记录的患者,只随机选择一个住院记录。这样,我创建了100个包含这些患者随机住院记录的数据集。我的因变量是一个计数变量,零膨胀负二项模型最适合。
我已经成功地在每个数据集上运行了回归模型(数据集由变量“sample”标识),但我不知道如何获得这100个回归模型的汇总结果。我想要得到每个预测变量的计数模型和零膨胀模型的汇总结果。
我使用的包括:library(dplyr); library(tidyverse); library(pscl); library(broom); library(jtools); library(mice)
。
pool
函数来自mice
包。
我像这样创建了合并的数据集:
set.seed(12345)
Combined_randcase <- bind_rows(replicate(100, cohort_1 %>%
group_by(patient) %>%
slice_sample(n=1, replace = TRUE), simplify=F), .id="sample")
Combined_randcase <- data.frame(as.list(Combined_randcase))
我在每个数据集上运行了ZINB回归模型,按“sample”分组,如下(使用broom
包):
regr_comb_randcase.zeroinfl = Combined_randcase %>%
nest_by(sample) %>%
mutate(model = list(zeroinfl(formula = cm_number ~ after_wm + age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication | age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = cohort_1, na.action = na.exclude, dist = "negbin"))) %>%
summarise(tidy(model))
这是我尝试获取汇总结果的方式:
models.zeroinfl <- regr_comb_randcase.zeroinfl$model
pool_results.zeroinfl <- pool(regr_comb_randcase.zeroinfl)
在运行第二行时,我收到了以下错误:
Error: No tidy method for objects of class character
对于另一个逻辑回归模型,我成功地做了以下操作:
regr_comb_randcase.log = Combined_randcase %>%
group_by(sample) %>%
do(model = glm(cm ~ after_wm + age + gender_male +ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = ., family = binomial()))
models <- regr_comb_randcase.log$model
pool_results <- pool(models)
summary(pool_results)
希望这能帮到你。
英文:
I have a dataset with patient data. Some of the patients have multiple stays in the hospital, but for the observations to be independent (I tried a multi-level model, but it's not possible with this data) I created a dataset, where for each patient that has multiple stays only one stay is selected randomly. Like this, I created 100 datasets with random stays for these patients. My dependent variable is a count variable and a zero-inflated negative binomial model fits best.
I already managed to run the regression model on each of the datasets (the datasets are identified by the variable "sample"), but I don't know how to get a pooled result for all of these 100 regressions. I would like to get the pooled results of the count model and of the zero-inflated model for every predictor.
I'm using:
library(dplyr); library(tidyverse); library(pscl); library(broom); library(jtools); library(mice)
The pool
function is from mice
.
I created the combined dataset like this:
set.seed(12345)
Combined_randcase <- bind_rows(replicate(100, cohort_1 %>% group_by(patient) %>%
slice_sample(n=1, replace = TRUE), simplify=F), .id="sample")
Combined_randcase <- data.frame(as.list(Combined_randcase))
I ran the ZINB regression model on each dataset, grouped by "sample", like this (using broom
package):
regr_comb_randcase.zeroinfl = Combined_randcase %>%
nest_by(sample) %>%
mutate(model = list(zeroinfl(formula = cm_number ~ after_wm + age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication | age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = cohort_1, na.action = na.exclude, dist = "negbin")))
%>%
summarise(tidy(model))
That's how I tried to get pooled results:
models.zeroinfl <- regr_comb_randcase.zeroinfl$model
pool_results.zeroinfl <- pool(regr_comb_randcase.zeroinfl)
When running the second line, I get this error:
Error: No tidy method for objects of class character
For another logistic regression model, I did this successfully:
regr_comb_randcase.log = Combined_randcase %>%
group_by(sample) %>%
do(model = glm(cm ~ after_wm + age + gender_male +ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = ., family = binomial()))
models <- regr_comb_randcase.log$model
pool_results <- pool(models)
summary(pool_results)
Output of dput(cohort_1_example)
(a shortened version of my dataset) for reproducibility:
structure(list(case = c("20001879", "20009253", "20003748", "20002321",
"20001662", "1910967", "20008058", "20010686", "20010938", "20009508",
"20002307", "20010105", "210098181", "21009818", "210100261",
"21010026", "21000865", "21002199", "1906803", "1907642", "20008274",
"21000858", "21004557", "1910669", "21004451", "21000202", "21000812",
"21001006", "21001143", "21001423", "1906820", "21000448", "21002128",
"21002666", "21003560", "1907070", "20011121", "1907614", "20002748",
"20010645", "21001363", "1908906", "1910981", "1905926", "21002429",
"21004264", "20011209", "20010442", "20009977", "1906382", "1909409",
"1908904", "1910516", "20001534", "20011201", "1907432", "1908332",
"1906356", "20011026", "20008206", "20000809", "1910664", "1908673",
"1907610", "1911046", "20008505", "20009385", "21000530", "1909620",
"1909730", "1910988", "20009899", "1907282", "1906671", "20007870",
"1910749", "20010782", "20009808", "20003311", "1910722", "1910529",
"1906638", "1906861", "1906956", "1910743", "20002057", "21000891",
"20010349", "20008503", "1906093", "1910662", "20008093", "20010683",
"20008787", "20003631", "20007796", "20008089", "21004141", "20010177",
"20001316", "1909809", "20001875", "20009552", "20001443", "21000419",
"20003106", "1909773", "21004600", "20008105", "21002070", "1908245",
"1909860", "21004209", "21003022", "20003151", "20011037", "21001966",
"20009902", "1906202", "1910009", "1910777", "20010294", "1910072"
), patient = c("10", "11", "100", "100", "101", "102", "103",
"105", "106", "107", "108", "11", "11", "11", "11", "11", "1000",
"1001", "1002", "1002", "1003", "1003", "1004", "1005", "1005",
"1006", "1008", "1009", "1009", "1009", "1011", "1012", "1013",
"1013", "1013", "1014", "1016", "1017", "1018", "1020", "1020",
"1021", "1022", "1023", "1026", "1026", "1029", "1030", "1033",
"1035", "1036", "1037", "1037", "1037", "1039", "1041", "1041",
"1042", "1042", "1043", "1044", "1045", "1046", "1047", "1048",
"1049", "1050", "1053", "1054", "1056", "1056", "1057", "1058",
"1060", "1061", "1064", "1064", "1064", "1065", "1066", "1067",
"1067", "1067", "1067", "1069", "1071", "1072", "1073", "1074",
"1075", "1075", "1076", "1077", "1078", "1079", "1080", "1081",
"1082", "1083", "1086", "1087", "1087", "1088", "1089", "1089",
"1090", "1091", "1091", "1092", "1093", "1094", "1094", "1095",
"1096", "1098", "1098", "1098", "1099", "1048", "1048", "1021",
"1018", "1011"), cm = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0), cm_number = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 5, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 2, 2, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0), total_cm_duration = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 165.000000003492, 0, 0, 0, 0, 0, 0,
0, 0, 174.999999994179, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 259.999999998836,
720, 0, 0, 0, 0, 0, 60.0000000069849, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 815.000000005821, 0, 0, 10865.0000000023,
0, 0, 0, 0, 0, 0, 420.000000006985, 0, 0, 0, 0, 0, 200.000000002328,
0, 0, 0, 0, 0, 0, 239.999999996508, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1084.99999999534, 0, 0, 145.000000001164, 0, 0, 789.999999997672,
435.000000003492, 0, 0, 60.0000000069849, 0, 0, 0, 0, 775.000000001164,
0, 0, 0, 0, 0, 0, 0), after_wm = c(0, 1, 0, 0, 0, 0, 1, 1, 1,
1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 1, 0), age = c(26, 27, 53, 53, 26, 28, 30,
57, 39, 50, 49, 27, 28, 28, 27, 27, 89, 18, 22, 22, 21, 21, 58,
35, 36, 63, 44, 35, 35, 35, 25, 24, 36, 36, 36, 62, 50, 21, 55,
23, 23, 44, 53, 71, 39, 39, 79, 47, 81, 43, 39, 21, 22, 22, 79,
22, 22, 33, 35, 86, 27, 42, 20, 30, 25, 22, 26, 62, 54, 46, 46,
46, 79, 39, 21, 63, 64, 64, 31, 59, 70, 70, 70, 70, 49, 37, 49,
63, 74, 38, 39, 74, 50, 72, 61, 80, 51, 45, 67, 45, 76, 76, 61,
30, 31, 35, 48, 49, 45, 30, 76, 76, 20, 18, 20, 20, 21, 51, 24,
24, 45, 55, 25), gender_male = c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 0, 0), ref_mode_police = c(0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), ref_lg_invol = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0), ref_reas_selfharm = c(1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,
1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0,
1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0), ref_reas_aggrpers = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), comm_limited = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), duration_days = c(41,
1, 2, 42, 3, 46, 3, 8, 1, 1, 21, 64, 25, 2, 25, 2, 22, 26, 7,
29, 101, 119, 153, 2, 2, 51, 10, 2, 6, 49, 83, 5, 1, 8, 1, 36,
71, 1, 7, 9, 166, 41, 2, 76, 12, 1, 25, 40, 4, 0, 2, 28, 1, 3,
49, 29, 54, 95, 119, 29, 28, 26, 43, 1, 15, 121, 22, 28, 73,
13, 39, 1, 119, 14, 73, 18, 124, 32, 2, 120, 67, 2, 2, 8, 29,
27, 34, 32, 112, 6, 8, 38, 118, 24, 38, 20, 2, 1, 2, 9, 1, 21,
42, 57, 49, 1, 1, 1, 35, 2, 45, 23, 64, 29, 2, 6, 56, 0, 5, 3,
58, 51, 2), diagnosis_personality = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_psychosis = c(1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0,
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1), diagnosis_mania = c(0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_substance = c(0,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1), intoxication = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1)), row.names = c(NA,
-123L), class = c("tbl_df", "tbl", "data.frame"))
I could find information on doing this with a linear and a logistic model, but not on a zero-inflated negbin model. Maybe that's why tidy is not working? Any help is much appreciated.
答案1
得分: 1
可以将混合/多层模型拟合到具有单例组的数据中;主要约束是是否有足够的总体信息来估计组间方差。
在主流软件包中,lme4
可以拟合线性混合模型 (LMM)、逻辑混合效应模型 (GLMM) 和负二项式混合模型(尽管速度较慢,没有经过大量工作的零膨胀混合模型,可以参考这里)。glmmTMB
可以处理上述所有情况。brms
可以做glmmTMB
能做的一切,还可以加入更多的内容,但速度较慢(因为贝叶斯MCMC),可能需要深入了解贝叶斯/MCMC抽样的细节。
以下示例使用您的数据子集,并采用大大简化的模型。您的样本数据中有86名患者,61名单例,总共有123个观察值。这似乎足够拟合简化模型(如下);但对于ZINB拟合(零膨胀项趋于零概率,NB离散参数趋于泊松分布)和逻辑拟合(奇异拟合,即患者间方差趋于零),我们会遇到一些典型的数据不足问题。如果您的完整数据集较大,这些问题发生的可能性要小得多...
除了加载软件包之外,下面的代码主要是一些样式的处理,以避免过多的重复代码,并使跨所有模型包括的公共预测变量更容易看到。
library(lme4)
library(glmmTMB)
## 所有固定效应预测变量的列表
fix_vars <- c("after_wm", "age", "gender_male", "ref_mode_police")
## 'resp'是响应变量的名称(字符)
ff <- function(resp) {
reformulate(c(fix_vars, "(1|patient)"), resp = resp)
}
## 高斯/LMM
lme4::lmer(ff("total_cm_duration"), data = dd)
## NB/无零膨胀
glmmTMB::glmmTMB(ff("cm_number"),
family = nbinom2,
data = dd)
## NB与(简单的)零膨胀
glmmTMB::glmmTMB(ff("cm_number"),
family = nbinom2,
## 可以使用 zi = ff(NULL) 来包括所有固定效应预测变量作为零膨胀预测变量...
zi = ~1,
data = dd)
## 逻辑
lme4::glmer(ff("cm"),
family = binomial,
data = dd)
英文:
We can fit mixed/multilevel models to data with singleton groups; the main constraint is whether there is enough information overall to make estimating the among-group variances feasible.
Among mainstream packages, lme4
can fit LMMs, logistic GLMMs, and negative binomial mixed models (albeit a bit slowly, and not zero-inflated mixed models without a lot of work: see e.g. here). glmmTMB
can handle all of the above. brms
can do anything glmmTMB
can do, plus the kitchen sink, but is slower (because Bayesian MCMC) and may need you to get into the weeds of Bayesian/MCMC sampling.
The examples below use your data subset, with considerably simplified models. Your sample data has 86 patients, 61 singletons, 123 total observations. This seems to be nearly enough to fit the simplified models (below); we do run into some of the typical not-quite-enough-data problems with the ZINB fit (zero-inflation term converges to zero probability, NB dispersion parameter converges to a Poisson distribution) and the logistic fit (singular fit, i.e. the among-patient variance converges to zero). These problems are much less likely to occur if your full data set is large ...
The first bit of machinery (besides loading packages) is cosmetic, to avoid too much repetitive code and make it easier to see what common set of predictors is included across all models.
library(lme4)
library(glmmTMB)
## list of all of the fixed effect predictors
fix_vars <- c("after_wm", "age", "gender_male", "ref_mode_police")
## 'resp' is the name of the response variable (character)
ff <- function(resp) {
reformulate(c(fix_vars, "(1|patient)"), resp = resp)
}
## Gaussian/LMM
lme4::lmer(ff("total_cm_duration"), data = dd)
## NB/no zero-inflation
glmmTMB::glmmTMB(ff("cm_number"),
family = nbinom2,
data = dd)
## NB with (simple) zero-inflation
glmmTMB::glmmTMB(ff("cm_number"),
family = nbinom2,
## could use zi = ff(NULL) to include all FE predictors
## as ZI predictors as well ...
zi = ~1,
data = dd)
## logistic
lme4::glmer(ff("cm"),
family = binomial,
data = dd)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论