英文:
Mutate case_when nested conditional labelling
问题
以下是代码部分的翻译:
# 创建一个新的变量来根据测试表现的变化对每个个体进行分类
df2 <- df %>%
mutate(test1_rankgroup = case_when(
all(test1_rank == "top") ~ "stable(top)",
all(test1_rank == "bottom") ~ "stable(bottom)",
test1_rank[1] == "top" & test1_rank[2] == "mid" & test1_rank[3] == "bottom" ~ "gradual",
test1_rank[1] == "top" & (test1_rank[2] == "bottom" | test1_rank[3] == "bottom") ~ "rapid",
TRUE ~ "Other"
),
test2_rankgroup = case_when(
all(test2_rank == "top") ~ "stable(top)",
all(test2_rank == "bottom") ~ "stable(bottom)",
test2_rank[1] == "top" & test2_rank[2] == "mid" & test2_rank[3] == "bottom" ~ "gradual",
test2_rank[1] == "top" & (test2_rank[2] == "bottom" | test2_rank[3] == "bottom") ~ "rapid",
TRUE ~ "Other"
))
希望这能帮助您完成您的数据处理任务。如果需要进一步的解释或帮助,请随时提问。
英文:
I have a data frame with multiple observations per ID. Each ID has performed many tests and for each test, has been classified into a tertile rank (top, mid, bottom) based on their test performance. This rank can vary at each time point.
Dummy df looks like this:
df <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
time = c(1,2,3,1,2,3,1,2,3,1,2,3),
test1_rank <- c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank <- c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"))
ID time `test1_rank <- ...` `test2_rank <- ...`
<dbl> <dbl> <chr> <chr>
1 1 1 top bottom
2 1 2 top bottom
3 1 3 top bottom
4 2 1 top top
5 2 2 mid mid
6 2 3 bottom bottom
7 3 1 bottom top
8 3 2 bottom top
9 3 3 bottom top
10 4 1 top top
11 4 2 bottom bottom
12 4 3 bottom bottom
I want to create a new variable where I classify each individual based on how their rank changes with time for each test. Specifically, for each test:
- If rank is the same through all three time points, they get a label
"stable" (can be stable(top) or stable(bottom) based on whether the
rank is "top" or "bottom" across all time points). - If the rank changes from top (time 1) to mid (time 2) to bottom (time 3), they
get a label of "gradual". - If rank declines from top (time 1) to
bottom (time 2 and time 3), they get a label of "rapid". - Any other
combination gets the label "Other".
Desired data frame:
df2 <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
time = c(1,2,3,1,2,3,1,2,3,1,2,3),
test1_rank <- c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank <- c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"),
test1_rankgroup <- c("stable(top)", "stable(top)", "stable(top)", "gradual", "gradual", "gradual", "stable(bottom)", "stable(bottom)", "stable(bottom)", "rapid", "rapid", "rapid"),
test2_rankgroup <- c("stable(bottom)", "stable(bottom)", "stable(bottom)", "gradual", "gradual", "gradual", "stable(top)", "stable(top)", "stable(top)", "rapid", "rapid", "rapid"))
ID time `test1_rank <- ...` `test2_rank <- ...` `test1_rankgroup <- ...` test2_ra…¹
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 top bottom stable(top) stable(bo…
2 1 2 top bottom stable(top) stable(bo…
3 1 3 top bottom stable(top) stable(bo…
4 2 1 top top gradual gradual
5 2 2 mid mid gradual gradual
6 2 3 bottom bottom gradual gradual
7 3 1 bottom top stable(bottom) stable(to…
8 3 2 bottom top stable(bottom) stable(to…
9 3 3 bottom top stable(bottom) stable(to…
10 4 1 top top rapid rapid
11 4 2 bottom bottom rapid rapid
12 4 3 bottom bottom rapid rapid
What is the easiest way to solve this in dplyr
using mutate
and case_when
?
答案1
得分: 3
You can use mutate
与across
一起使用,将case_when
应用于你的两个排名列。
你的case_when
可以使用n_distinct
来查看相同的值是否在你的三个时间点保持不变。你还可以为标记的组值的第一个、第二个和第三个值包含特定的逻辑。
.default
参数可以用于"其他"情况。.by
参数将在ID
分组的基础上执行mutate
操作。
英文:
You can use mutate
with across
to apply your case_when
to both of your ranked columns.
Your case_when
can use n_distinct
to see if the same value is held across your 3 time points. You can also include specific logic for 1st, 2nd, and 3rd values for a labelled group value.
The .default argument can be used for "other". The .by argument will perform the mutate
grouped by ID
.
library(tidyverse)
df <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
time = c(1,2,3,1,2,3,1,2,3,1,2,3),
test1_rank = c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank = c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"))
df %>%
mutate(across(-time,
~ case_when(n_distinct(.) == 1 ~ paste("stable", .),
.[1] == "top" & .[2] == "mid" & .[3] == "bottom" ~ "gradual",
.[1] == "top" & .[2] == "bottom" & .[3] == "bottom" ~ "rapid",
.default = "other"),
.names = "{.col}_group"),
.by = ID)
Output
ID time test1_rank test2_rank test1_rank_group test2_rank_group
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 top bottom stable top stable bottom
2 1 2 top bottom stable top stable bottom
3 1 3 top bottom stable top stable bottom
4 2 1 top top gradual gradual
5 2 2 mid mid gradual gradual
6 2 3 bottom bottom gradual gradual
7 3 1 bottom top stable bottom stable top
8 3 2 bottom top stable bottom stable top
9 3 3 bottom top stable bottom stable top
10 4 1 top top rapid rapid
11 4 2 bottom bottom rapid rapid
12 4 3 bottom bottom rapid rapid
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论