2023年4月19日 22:23:11go评论77阅读模式

英文:

Mutate case_when nested conditional labelling

问题

以下是代码部分的翻译：

# 创建一个新的变量来根据测试表现的变化对每个个体进行分类
df2 <- df %>%
  mutate(test1_rankgroup = case_when(
    all(test1_rank == "top") ~ "stable(top)",
    all(test1_rank == "bottom") ~ "stable(bottom)",
    test1_rank[1] == "top" & test1_rank[2] == "mid" & test1_rank[3] == "bottom" ~ "gradual",
    test1_rank[1] == "top" & (test1_rank[2] == "bottom" | test1_rank[3] == "bottom") ~ "rapid",
    TRUE ~ "Other"
  ),
  test2_rankgroup = case_when(
    all(test2_rank == "top") ~ "stable(top)",
    all(test2_rank == "bottom") ~ "stable(bottom)",
    test2_rank[1] == "top" & test2_rank[2] == "mid" & test2_rank[3] == "bottom" ~ "gradual",
    test2_rank[1] == "top" & (test2_rank[2] == "bottom" | test2_rank[3] == "bottom") ~ "rapid",
    TRUE ~ "Other"
  ))

希望这能帮助您完成您的数据处理任务。如果需要进一步的解释或帮助，请随时提问。

英文:

I have a data frame with multiple observations per ID. Each ID has performed many tests and for each test, has been classified into a tertile rank (top, mid, bottom) based on their test performance. This rank can vary at each time point.

Dummy df looks like this:

      df &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
        time = c(1,2,3,1,2,3,1,2,3,1,2,3),
        test1_rank &lt;- c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
test2_rank &lt;- c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;))
    
     ID  time `test1_rank &lt;- ...` `test2_rank &lt;- ...`
   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;               &lt;chr&gt;              
 1     1     1 top                 bottom             
 2     1     2 top                 bottom             
 3     1     3 top                 bottom             
 4     2     1 top                 top                
 5     2     2 mid                 mid                
 6     2     3 bottom              bottom             
 7     3     1 bottom              top                
 8     3     2 bottom              top                
 9     3     3 bottom              top                
10     4     1 top                 top                
11     4     2 bottom              bottom             
12     4     3 bottom              bottom

I want to create a new variable where I classify each individual based on how their rank changes with time for each test. Specifically, for each test:

If rank is the same through all three time points, they get a label
"stable" (can be stable(top) or stable(bottom) based on whether the
rank is "top" or "bottom" across all time points).
If the rank changes from top (time 1) to mid (time 2) to bottom (time 3), they
get a label of "gradual".
If rank declines from top (time 1) to
bottom (time 2 and time 3), they get a label of "rapid".
Any other
combination gets the label "Other".

Desired data frame:

  df2 &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
                  time = c(1,2,3,1,2,3,1,2,3,1,2,3),
                  test1_rank &lt;- c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
test2_rank &lt;- c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
                  test1_rankgroup &lt;- c(&quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;rapid&quot;, &quot;rapid&quot;, &quot;rapid&quot;), 
test2_rankgroup &lt;- c(&quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;stable(bottom)&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;gradual&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;stable(top)&quot;, &quot;rapid&quot;, &quot;rapid&quot;, &quot;rapid&quot;))
        ID  time `test1_rank &lt;- ...` `test2_rank &lt;- ...` `test1_rankgroup &lt;- ...` test2_ra…&#185;
   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;               &lt;chr&gt;               &lt;chr&gt;                    &lt;chr&gt;     
 1     1     1 top                 bottom              stable(top)              stable(bo…
 2     1     2 top                 bottom              stable(top)              stable(bo…
 3     1     3 top                 bottom              stable(top)              stable(bo…
 4     2     1 top                 top                 gradual                  gradual   
 5     2     2 mid                 mid                 gradual                  gradual   
 6     2     3 bottom              bottom              gradual                  gradual   
 7     3     1 bottom              top                 stable(bottom)           stable(to…
 8     3     2 bottom              top                 stable(bottom)           stable(to…
 9     3     3 bottom              top                 stable(bottom)           stable(to…
10     4     1 top                 top                 rapid                    rapid     
11     4     2 bottom              bottom              rapid                    rapid     
12     4     3 bottom              bottom              rapid                    rapid

What is the easiest way to solve this in dplyr using mutate and case_when?

答案1

得分: 3

You can use mutate与across一起使用，将case_when应用于你的两个排名列。

你的case_when可以使用n_distinct来查看相同的值是否在你的三个时间点保持不变。你还可以为标记的组值的第一个、第二个和第三个值包含特定的逻辑。

.default参数可以用于"其他"情况。.by参数将在ID分组的基础上执行mutate操作。

英文:

You can use mutate with across to apply your case_when to both of your ranked columns.

Your case_when can use n_distinct to see if the same value is held across your 3 time points. You can also include specific logic for 1st, 2nd, and 3rd values for a labelled group value.

The .default argument can be used for "other". The .by argument will perform the mutate grouped by ID.

library(tidyverse)
df &lt;- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
             time = c(1,2,3,1,2,3,1,2,3,1,2,3),
             test1_rank = c(&quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;),
             test2_rank = c(&quot;bottom&quot;, &quot;bottom&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;mid&quot;, &quot;bottom&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;top&quot;, &quot;bottom&quot;, &quot;bottom&quot;))
df %&gt;%
  mutate(across(-time,
               ~ case_when(n_distinct(.) == 1 ~ paste(&quot;stable&quot;, .),
                           .[1] == &quot;top&quot; &amp; .[2] == &quot;mid&quot; &amp; .[3] == &quot;bottom&quot; ~ &quot;gradual&quot;,
                           .[1] == &quot;top&quot; &amp; .[2] == &quot;bottom&quot; &amp; .[3] == &quot;bottom&quot; ~ &quot;rapid&quot;,
                           .default = &quot;other&quot;), 
               .names = &quot;{.col}_group&quot;),
         .by = ID)

Output

      ID  time test1_rank test2_rank test1_rank_group test2_rank_group
   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt;            &lt;chr&gt;           
 1     1     1 top        bottom     stable top       stable bottom   
 2     1     2 top        bottom     stable top       stable bottom   
 3     1     3 top        bottom     stable top       stable bottom   
 4     2     1 top        top        gradual          gradual         
 5     2     2 mid        mid        gradual          gradual         
 6     2     3 bottom     bottom     gradual          gradual         
 7     3     1 bottom     top        stable bottom    stable top      
 8     3     2 bottom     top        stable bottom    stable top      
 9     3     3 bottom     top        stable bottom    stable top      
10     4     1 top        top        rapid            rapid           
11     4     2 bottom     bottom     rapid            rapid           
12     4     3 bottom     bottom     rapid            rapid

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Mutate case_when 嵌套条件标签

问题

答案1

将R中的数据框从宽格式转换为长格式，使用多组变量。

Strange behavior in noUiSliderInput() when formatting decimal to integer, e.g., 5.00 to 5.

主页由pkgdown生成，但无法正确渲染。

获取子集函数中的变量值 – R

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。