如何分组落在一定数值范围内的变量

huangapple go评论68阅读模式
英文:

How to group variables that falls within a range of numbers

问题

以下是您要求的翻译内容:

我想要创建一个这样的数据框(DF):

    my_df <- data.frame(
        b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
        b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
        b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
        b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
        b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
        b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
        b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
        b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
        b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12))

我想要基于以下假设创建一个新列(NEW)。

如果 b9 <= 2,则写入 "yellow"如果 b9 在 47 之间,则写入 "white"如果 b9 >= 9,则写入 "green"

想要创建类似这样的东西:

    my_df1 <- data.frame(
            b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
            b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
            b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
            b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
            b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
            b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
            b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
            b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
            b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12),
            NEW = c("white", "white", "white", "green", "white", "yellow", "yellow", "yellow", "green"))

我认为以下代码可以实现这个目标,但实际上没有成功。

    greater_threshold <- 2
    greater_threshold1 <- 4
    greater_threshold2 <- 7
    greater_threshold3 <- 9
    
    my_df1 <- my_df %>%
        mutate(NEW = case_when(b9 <= greater_threshold ~ "yellow", b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white", b9 >= greater_threshold3 ~ "green"))

希望这能帮助您解决问题。

英文:

I a df like this

my_df &lt;- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12))

I wanted to create a new column (NEW) based on the following assumptions.

If b9 is <= 2 write yellow.
If b9 is between 4 and 7 write white.
If b9 is >= 9 write green

The idea is to create something like this.

my_df1 &lt;- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12),
NEW = c(&quot;white&quot;, &quot;white&quot;, &quot;white&quot;, &quot;green&quot;, &quot;white&quot;, &quot;yellow&quot;, &quot;yellow&quot;, &quot;yellow&quot;, &quot;green&quot;))

I thought something like this will do it, but it didn't.

greater_threshold &lt;- 2
greater_threshold1 &lt;- 4
greater_threshold2 &lt;- 7
greater_threshold3 &lt;- 9
my_df1 &lt;- my_df %&gt;%
mutate(NEW = case_when(b9 &lt;= greater_threshold ~ &quot;yellow&quot;, b9 &gt;= greater_threshold1 | b9 &lt;= greater_threshold2 ~ &quot;white&quot;, b9 &gt;= greater_threshold3 ~ &quot;green&quot;))

Any help will be appreciated.

答案1

得分: 1

你的设置存在一些问题,例如,当b9等于3或8时,你希望得到的结果没有标签:

如果b9小于等于2,则写入黄色。如果b9介于4到7之间,则写入白色。如果b9大于等于9,则写入绿色。

我将更改“白色”条件,将其标签更改为b9介于3到7之间,将“绿色”更改为大于等于8,以保持逻辑清晰。然后,应该可以这样工作:

greater_threshold1 <- 2
greater_threshold2 <- 7

my_df <- mutate(my_df,
                NEW = case_when(
                  b9 > greater_threshold2 ~ 'green',
                  b9 > greater_threshold1 ~ 'white',
                  TRUE ~ 'yellow'
                )) %>% print()

输出:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

case_when 在你列出条件列表时,从最严格的条件开始执行效果最好,然后逐渐变宽 - 它一旦找到与你的变量匹配的条件,就会执行赋值操作,所以你不必担心,例如,b9=12会匹配前两个条件 - case_when 甚至不会执行第一个条件之后的操作。然后,你甚至不必定义最宽松的条件,只需使用 TRUE,它基本上表示,如果你到了这一步,这是你剩下的赋值。

你的代码是倒过来的,所以例如 b9=12 会被分配为“白色”,因为它满足第二个条件,b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white",而赋值函数永远不会到达你真正想要的条件,即 b9 >= greater_threshold3 ~ "green"。虽然你也可以得到你想要的结果,如果你将第二个条件中的OR操作符更改为AND,即 b9 >= greater_threshold1 & b9 <= greater_threshold2 ~ "white"。然而,如果你使用我描述的从最严格到最宽松的条件方法,就不需要操作 &,而且代码更清晰。

英文:

Your setup has some issues, for example what say you want would leave no label when b9 equals 3 or 8:

> If b9 is <= 2 write yellow. If b9 is between 4 and 7 write white. If
> b9 is >= 9 write green

I'm going to change the "white" condition to label b9 between 3 and 7, and "green" to >= 8, to maintain sanity. Then this should work:

greater_threshold1 &lt;- 2
greater_threshold2 &lt;- 7
my_df &lt;- mutate(my_df,
NEW = case_when(
b9 &gt; greater_threshold2 ~ &#39;green&#39;,
b9 &gt; greater_threshold1 ~ &#39;white&#39;,
TRUE ~ &#39;yellow&#39;
)) %&gt;% print()

Output:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

case_when works best if you start with the narrowest condition, and then move wider as you go down the list of conditions - it executes the assignment operation as soon as it finds the condition that matches your variable, so you don't have to worry about the fact that, for example, b9=12 would match both of the first two conditions - case_when won't even get past the first condition. Then you don't even have to define your most permissive condition, just use TRUE which basically means, if you've gotten this far, here's the assignment you are left with.

Your code goes backwards, so e.g. b9=12 gets assigned "white" because it satisfies the second condition, b9 &gt;= greater_threshold1 | b9 &lt;= greater_threshold2 ~ &quot;white&quot;, and the assignment function never gets to the one you really want, b9 &gt;= greater_threshold3 ~ &quot;green&quot;. Although you can also get what you want, I think, if you change the OR operator in the second condition to AND, i.e. b9 &gt;= greater_threshold1 &amp; b9 &lt;= greater_threshold2 ~ &quot;white&quot;. If you use the narrow to wider condition approach that I describe, though, you don't need to mess with &amp;, plus you get cleaner code.

答案2

得分: 1

我们可以使用 cut() 来实现这个目标:

library(dplyr)

my_df %>%
  mutate(NEW = cut(b9, 
                   breaks = c(-Inf, 2, 4, 7, Inf),
                   labels = c("yellow", "white", "white", "green"),
                   include.lowest = TRUE))

结果如下:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green
英文:

We can use cut() for this:

library(dplyr)
my_df %&gt;%
mutate(NEW = cut(b9, 
breaks = c(-Inf, 2, 4, 7, Inf),
labels = c(&quot;yellow&quot;, &quot;white&quot;, &quot;white&quot;, &quot;green&quot;),
include.lowest = TRUE))
b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

答案3

得分: 0

你可以使用 dplyr 中的 between 函数:

my_df %>%
  mutate(NEW = case_when(
    b9 <= 2 ~ "Yellow",
    between(b9, 4, 7) ~ "white",
    b9 >= 9 ~ "green"
  ))

输出:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 Yellow
7  1   1  9  3  5  6  2  1  1 Yellow
8  9   1  5  6  6  7  9  3  2 Yellow
9 NA   7 80  2 NA  9  5  6 12  green

不满足条件的值(例如,8)将被视为 NA

英文:

You can use between from dplyr:

my_df %&gt;%
mutate(NEW = case_when(
b9 &lt;= 2 ~ &quot;Yellow&quot;,
between(b9, 4, 7) ~ &quot;white&quot;,
b9 &gt;= 9 ~ &quot;green&quot;
))

Output:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 Yellow
7  1   1  9  3  5  6  2  1  1 Yellow
8  9   1  5  6  6  7  9  3  2 Yellow
9 NA   7 80  2 NA  9  5  6 12  green

Those not falling within the conditions (ie, 8) will be NA

huangapple
  • 本文由 发表于 2023年6月1日 22:08:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定