英文:
How to mask certain cell's info from a gtsummary table
问题
我有一些敏感信息在我的数据中,我需要将其隐藏在某个特定的阈值以下(以符合数据使用协议并防止重新识别数据)。我正在使用gtsummary
中的tbl_svysmmary()
。在我的示例中,我想要过滤"cell size ≤ 100":
library(gtsummary)
library(survey)
tbl_svysummary <-
svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
tbl_svysummary(by = Survived, percent = "row", include = c(Class, Age))
tbl_svysummary
我想显示儿童信息如下:
编辑:用于多个stat_列和0/1变量的可重现示例:
library(gtsummary)
library(survey)
supp_outcomes <-
as.data.frame(Titanic) %>%
mutate(Female = ifelse(Sex == "Female", 1, 0)) %>%
svydesign(~1, data = ., weights = ~Freq) %>%
tbl_svysummary(by = Survived, percent = "row",
include = c(Age, Female, Class)) %>%
add_overall() %>%
add_p()
supp_outcomes
从@Marco建议的解决方案中编辑的代码:
supp_outcomes$table_body <- supp_outcomes$table_body %>%
mutate(extra1 = stat_0,
extra2 = stat_1,
extra3 = stat_2) %>%
separate(extra1, c("number"), sep = ' \\(') %>%
separate(extra2, c("number"), sep = ' \\(') %>%
separate(extra3, c("number"), sep = ' \\(') %>%
mutate(number = as.numeric(number)) %>%
mutate(stat_0 = case_when(
number < 200 & number > 0 & var_type %in% c("dichotomous", "categorical") ~ "TOO FEW",
TRUE ~ stat_0),
stat_1 = case_when(
number < 200 & number > 0 & var_type %in% c("dichotomous", "categorical") ~ "TOO FEW",
TRUE ~ stat_1),
stat_2 = case_when(
number < 200 & number > 0 & var_type %in% c("dichotomous", "categorical") ~ "TOO FEW",
TRUE ~ stat_2)) %>%
select(!number)
supp_outcomes
英文:
I have some sensitive info in my data that I need to hide below a certain threshold (to comply with DUA and prevent reidentifying data). I'm using tbl_svysmmary()
from gtsummary
. In my example, I like to filter "cell size ≤ 100":
library(gtsummary)
library(survey)
tbl_svysummary <-
svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
tbl_svysummary(by = Survived, percent = "row", include = c(Class, Age))
tbl_svysummary
I want to show child info as:
EDIT: Reproducible example for multiple stat_ columns and 0/1 variables:
library(gtsummary)
library(survey)
supp_outcomes <-
as.data.frame(Titanic) %>%
mutate (Female=ifelse(Sex=="Female",1,0)) %>%
svydesign(~1, data = ., weights = ~Freq) %>%
tbl_svysummary(by = Survived, percent = "row",
include = c(Age, Female, Class)) %>%
add_overall() %>% add_p()
supp_outcomes
The edited code from @Marco's suggested solution:
supp_outcomes$table_body <- supp_outcomes$table_body %>%
mutate(extra1 = stat_0,
extra2 = stat_1,
extra3 = stat_2) %>%
separate(extra1, c("number"), sep = ' \\(') %>%
separate(extra2, c("number"), sep = ' \\(') %>%
separate( extra3, c("number"), sep = ' \\(') %>%
mutate(number = as.numeric(number)) %>%
mutate(stat_0 = case_when(
number < 200 & number > 0 & var_type %in%c("dichotomous", "categorical")~ "TOO FEW",
TRUE ~ stat_0),
stat_1 = case_when(
number < 200 & number > 0 & var_type %in%c("dichotomous", "categorical")~ "TOO FEW",
TRUE ~ stat_1),
stat_2 = case_when(
number < 200 & number > 0 & var_type %in%c("dichotomous", "categorical")~ "TOO FEW",
TRUE ~ stat_2)) %>%
select(!number)
supp_outcomes
答案1
得分: 1
你可以对输出对象进行更多的 tidyverse
操作,就像这样(使用 table_body
):
library(tidyverse)
data(mtcars)
library(gtsummary)
output <- mtcars[,1:2] %>% tbl_summary()
output$table_body
# 一个 tibble: 5 × 6
variable var_type var_label row_type label stat_0
<chr> <chr> <chr> <chr> <chr> <chr>
1 mpg continuous mpg label mpg 19.2 (15.4, 22.8)
2 cyl categorical cyl label cyl NA
3 cyl categorical cyl level 4 11 (34%)
4 cyl categorical cyl level 6 7 (22%)
5 cyl categorical cyl level 8 14 (44%)
# 当 cyl 的 N 少于 10 时,不显示单元格信息
output$table_body <- output$table_body %>%
mutate(extra = stat_0) %>%
separate(extra, c("number"), sep = ' \\(') %>%
mutate(number = as.numeric(number)) %>%
mutate(stat_0 = case_when(number < 10 & var_type == "categorical" ~ "TOO FEW",
TRUE ~ stat_0)) %>%
select(!number)
output
case_when
中的筛选决策基于分类变量,当单元格信息少于阈值时,你可以控制它。你可以根据其他变量或条件进行调整。
英文:
You can do more tidyverse
manipulation on the output object like this (using the table_body
):
library(tidyverse)
data(mtcars)
library(gtsummary)
output <- mtcars[,1:2] %>% tbl_summary()
output$table_body
# A tibble: 5 × 6
variable var_type var_label row_type label stat_0
<chr> <chr> <chr> <chr> <chr> <chr>
1 mpg continuous mpg label mpg 19.2 (15.4, 22.8)
2 cyl categorical cyl label cyl NA
3 cyl categorical cyl level 4 11 (34%)
4 cyl categorical cyl level 6 7 (22%)
5 cyl categorical cyl level 8 14 (44%)
# Don't show cell information when N of cyl is less than 10
output$table_body <- output$table_body %>%
mutate(extra = stat_0) %>%
separate(extra, c("number"), sep = ' \\(') %>%
mutate(number = as.numeric(number)) %>%
mutate(stat_0 = case_when(number < 10 & var_type == "categorical" ~ "TOO FEW",
TRUE ~ stat_0)) %>%
select(!number)
output
The filter decision in case_when
is based on categorical variables, where you like to control the cell information, when it is less than a threshold value. You can adjust this for any other variable or condition.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论