Mutate case_when 嵌套条件标签

huangapple go评论77阅读模式
英文:

Mutate case_when nested conditional labelling

问题

以下是代码部分的翻译:

  1. # 创建一个新的变量来根据测试表现的变化对每个个体进行分类
  2. df2 <- df %>%
  3. mutate(test1_rankgroup = case_when(
  4. all(test1_rank == "top") ~ "stable(top)",
  5. all(test1_rank == "bottom") ~ "stable(bottom)",
  6. test1_rank[1] == "top" & test1_rank[2] == "mid" & test1_rank[3] == "bottom" ~ "gradual",
  7. test1_rank[1] == "top" & (test1_rank[2] == "bottom" | test1_rank[3] == "bottom") ~ "rapid",
  8. TRUE ~ "Other"
  9. ),
  10. test2_rankgroup = case_when(
  11. all(test2_rank == "top") ~ "stable(top)",
  12. all(test2_rank == "bottom") ~ "stable(bottom)",
  13. test2_rank[1] == "top" & test2_rank[2] == "mid" & test2_rank[3] == "bottom" ~ "gradual",
  14. test2_rank[1] == "top" & (test2_rank[2] == "bottom" | test2_rank[3] == "bottom") ~ "rapid",
  15. TRUE ~ "Other"
  16. ))

希望这能帮助您完成您的数据处理任务。如果需要进一步的解释或帮助,请随时提问。

英文:

I have a data frame with multiple observations per ID. Each ID has performed many tests and for each test, has been classified into a tertile rank (top, mid, bottom) based on their test performance. This rank can vary at each time point.

Dummy df looks like this:

  1. df &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  2. time = c(1,2,3,1,2,3,1,2,3,1,2,3),
  3. test1_rank &lt;- c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
  4. test2_rank &lt;- c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;))
  5. ID time `test1_rank &lt;- ...` `test2_rank &lt;- ...`
  6. &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
  7. 1 1 1 top bottom
  8. 2 1 2 top bottom
  9. 3 1 3 top bottom
  10. 4 2 1 top top
  11. 5 2 2 mid mid
  12. 6 2 3 bottom bottom
  13. 7 3 1 bottom top
  14. 8 3 2 bottom top
  15. 9 3 3 bottom top
  16. 10 4 1 top top
  17. 11 4 2 bottom bottom
  18. 12 4 3 bottom bottom

I want to create a new variable where I classify each individual based on how their rank changes with time for each test. Specifically, for each test:

  • If rank is the same through all three time points, they get a label
    "stable" (can be stable(top) or stable(bottom) based on whether the
    rank is "top" or "bottom" across all time points).
  • If the rank changes from top (time 1) to mid (time 2) to bottom (time 3), they
    get a label of "gradual".
  • If rank declines from top (time 1) to
    bottom (time 2 and time 3), they get a label of "rapid".
  • Any other
    combination gets the label "Other".

Desired data frame:

  1. df2 &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  2. time = c(1,2,3,1,2,3,1,2,3,1,2,3),
  3. test1_rank &lt;- c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
  4. test2_rank &lt;- c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
  5. test1_rankgroup &lt;- c(&quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;rapid&quot;, &quot;rapid&quot;, &quot;rapid&quot;),
  6. test2_rankgroup &lt;- c(&quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;rapid&quot;, &quot;rapid&quot;, &quot;rapid&quot;))
  7. ID time `test1_rank &lt;- ...` `test2_rank &lt;- ...` `test1_rankgroup &lt;- ...` test2_ra…&#185;
  8. &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  9. 1 1 1 top bottom stable(top) stable(bo
  10. 2 1 2 top bottom stable(top) stable(bo
  11. 3 1 3 top bottom stable(top) stable(bo
  12. 4 2 1 top top gradual gradual
  13. 5 2 2 mid mid gradual gradual
  14. 6 2 3 bottom bottom gradual gradual
  15. 7 3 1 bottom top stable(bottom) stable(to
  16. 8 3 2 bottom top stable(bottom) stable(to
  17. 9 3 3 bottom top stable(bottom) stable(to
  18. 10 4 1 top top rapid rapid
  19. 11 4 2 bottom bottom rapid rapid
  20. 12 4 3 bottom bottom rapid rapid

What is the easiest way to solve this in dplyr using mutate and case_when?

答案1

得分: 3

You can use mutateacross一起使用,将case_when应用于你的两个排名列。

你的case_when可以使用n_distinct来查看相同的值是否在你的三个时间点保持不变。你还可以为标记的组值的第一个、第二个和第三个值包含特定的逻辑。

.default参数可以用于"其他"情况。.by参数将在ID分组的基础上执行mutate操作。

英文:

You can use mutate with across to apply your case_when to both of your ranked columns.

Your case_when can use n_distinct to see if the same value is held across your 3 time points. You can also include specific logic for 1st, 2nd, and 3rd values for a labelled group value.

The .default argument can be used for "other". The .by argument will perform the mutate grouped by ID.

  1. library(tidyverse)
  2. df &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  3. time = c(1,2,3,1,2,3,1,2,3,1,2,3),
  4. test1_rank = c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
  5. test2_rank = c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;))
  6. df %&gt;%
  7. mutate(across(-time,
  8. ~ case_when(n_distinct(.) == 1 ~ paste(&quot;stable&quot;, .),
  9. .[1] == &quot;top&quot; &amp; .[2] == &quot;mid&quot; &amp; .[3] == &quot;bottom&quot; ~ &quot;gradual&quot;,
  10. .[1] == &quot;top&quot; &amp; .[2] == &quot;bottom&quot; &amp; .[3] == &quot;bottom&quot; ~ &quot;rapid&quot;,
  11. .default = &quot;other&quot;),
  12. .names = &quot;{.col}_group&quot;),
  13. .by = ID)

Output

  1. ID time test1_rank test2_rank test1_rank_group test2_rank_group
  2. &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  3. 1 1 1 top bottom stable top stable bottom
  4. 2 1 2 top bottom stable top stable bottom
  5. 3 1 3 top bottom stable top stable bottom
  6. 4 2 1 top top gradual gradual
  7. 5 2 2 mid mid gradual gradual
  8. 6 2 3 bottom bottom gradual gradual
  9. 7 3 1 bottom top stable bottom stable top
  10. 8 3 2 bottom top stable bottom stable top
  11. 9 3 3 bottom top stable bottom stable top
  12. 10 4 1 top top rapid rapid
  13. 11 4 2 bottom bottom rapid rapid
  14. 12 4 3 bottom bottom rapid rapid

huangapple
  • 本文由 发表于 2023年4月19日 22:23:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055651.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定