Mutate case_when 嵌套条件标签

huangapple go评论52阅读模式
英文:

Mutate case_when nested conditional labelling

问题

以下是代码部分的翻译:

# 创建一个新的变量来根据测试表现的变化对每个个体进行分类
df2 <- df %>%
  mutate(test1_rankgroup = case_when(
    all(test1_rank == "top") ~ "stable(top)",
    all(test1_rank == "bottom") ~ "stable(bottom)",
    test1_rank[1] == "top" & test1_rank[2] == "mid" & test1_rank[3] == "bottom" ~ "gradual",
    test1_rank[1] == "top" & (test1_rank[2] == "bottom" | test1_rank[3] == "bottom") ~ "rapid",
    TRUE ~ "Other"
  ),
  test2_rankgroup = case_when(
    all(test2_rank == "top") ~ "stable(top)",
    all(test2_rank == "bottom") ~ "stable(bottom)",
    test2_rank[1] == "top" & test2_rank[2] == "mid" & test2_rank[3] == "bottom" ~ "gradual",
    test2_rank[1] == "top" & (test2_rank[2] == "bottom" | test2_rank[3] == "bottom") ~ "rapid",
    TRUE ~ "Other"
  ))

希望这能帮助您完成您的数据处理任务。如果需要进一步的解释或帮助,请随时提问。

英文:

I have a data frame with multiple observations per ID. Each ID has performed many tests and for each test, has been classified into a tertile rank (top, mid, bottom) based on their test performance. This rank can vary at each time point.

Dummy df looks like this:

      df &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
        time = c(1,2,3,1,2,3,1,2,3,1,2,3),
        test1_rank &lt;- c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
test2_rank &lt;- c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;))
    
     ID  time `test1_rank &lt;- ...` `test2_rank &lt;- ...`
   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;               &lt;chr&gt;              
 1     1     1 top                 bottom             
 2     1     2 top                 bottom             
 3     1     3 top                 bottom             
 4     2     1 top                 top                
 5     2     2 mid                 mid                
 6     2     3 bottom              bottom             
 7     3     1 bottom              top                
 8     3     2 bottom              top                
 9     3     3 bottom              top                
10     4     1 top                 top                
11     4     2 bottom              bottom             
12     4     3 bottom              bottom  

I want to create a new variable where I classify each individual based on how their rank changes with time for each test. Specifically, for each test:

  • If rank is the same through all three time points, they get a label
    "stable" (can be stable(top) or stable(bottom) based on whether the
    rank is "top" or "bottom" across all time points).
  • If the rank changes from top (time 1) to mid (time 2) to bottom (time 3), they
    get a label of "gradual".
  • If rank declines from top (time 1) to
    bottom (time 2 and time 3), they get a label of "rapid".
  • Any other
    combination gets the label "Other".

Desired data frame:

  df2 &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
                  time = c(1,2,3,1,2,3,1,2,3,1,2,3),
                  test1_rank &lt;- c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
test2_rank &lt;- c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
                  test1_rankgroup &lt;- c(&quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;rapid&quot;, &quot;rapid&quot;, &quot;rapid&quot;), 
test2_rankgroup &lt;- c(&quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;rapid&quot;, &quot;rapid&quot;, &quot;rapid&quot;))

        ID  time `test1_rank &lt;- ...` `test2_rank &lt;- ...` `test1_rankgroup &lt;- ...` test2_ra…&#185;
   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;               &lt;chr&gt;               &lt;chr&gt;                    &lt;chr&gt;     
 1     1     1 top                 bottom              stable(top)              stable(bo…
 2     1     2 top                 bottom              stable(top)              stable(bo…
 3     1     3 top                 bottom              stable(top)              stable(bo…
 4     2     1 top                 top                 gradual                  gradual   
 5     2     2 mid                 mid                 gradual                  gradual   
 6     2     3 bottom              bottom              gradual                  gradual   
 7     3     1 bottom              top                 stable(bottom)           stable(to…
 8     3     2 bottom              top                 stable(bottom)           stable(to…
 9     3     3 bottom              top                 stable(bottom)           stable(to…
10     4     1 top                 top                 rapid                    rapid     
11     4     2 bottom              bottom              rapid                    rapid     
12     4     3 bottom              bottom              rapid                    rapid 

What is the easiest way to solve this in dplyr using mutate and case_when?

答案1

得分: 3

You can use mutateacross一起使用,将case_when应用于你的两个排名列。

你的case_when可以使用n_distinct来查看相同的值是否在你的三个时间点保持不变。你还可以为标记的组值的第一个、第二个和第三个值包含特定的逻辑。

.default参数可以用于"其他"情况。.by参数将在ID分组的基础上执行mutate操作。

英文:

You can use mutate with across to apply your case_when to both of your ranked columns.

Your case_when can use n_distinct to see if the same value is held across your 3 time points. You can also include specific logic for 1st, 2nd, and 3rd values for a labelled group value.

The .default argument can be used for "other". The .by argument will perform the mutate grouped by ID.

library(tidyverse)

df &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
             time = c(1,2,3,1,2,3,1,2,3,1,2,3),
             test1_rank = c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
             test2_rank = c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;))

df %&gt;%
  mutate(across(-time,
               ~ case_when(n_distinct(.) == 1 ~ paste(&quot;stable&quot;, .),
                           .[1] == &quot;top&quot; &amp; .[2] == &quot;mid&quot; &amp; .[3] == &quot;bottom&quot; ~ &quot;gradual&quot;,
                           .[1] == &quot;top&quot; &amp; .[2] == &quot;bottom&quot; &amp; .[3] == &quot;bottom&quot; ~ &quot;rapid&quot;,
                           .default = &quot;other&quot;), 
               .names = &quot;{.col}_group&quot;),
         .by = ID)

Output

      ID  time test1_rank test2_rank test1_rank_group test2_rank_group
   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt;            &lt;chr&gt;           
 1     1     1 top        bottom     stable top       stable bottom   
 2     1     2 top        bottom     stable top       stable bottom   
 3     1     3 top        bottom     stable top       stable bottom   
 4     2     1 top        top        gradual          gradual         
 5     2     2 mid        mid        gradual          gradual         
 6     2     3 bottom     bottom     gradual          gradual         
 7     3     1 bottom     top        stable bottom    stable top      
 8     3     2 bottom     top        stable bottom    stable top      
 9     3     3 bottom     top        stable bottom    stable top      
10     4     1 top        top        rapid            rapid           
11     4     2 bottom     bottom     rapid            rapid           
12     4     3 bottom     bottom     rapid            rapid   

huangapple
  • 本文由 发表于 2023年4月19日 22:23:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055651.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定