英文:
How to group variables that falls within a range of numbers
问题
以下是您要求的翻译内容:
我想要创建一个这样的数据框(DF):
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12))
我想要基于以下假设创建一个新列(NEW)。
如果 b9 <= 2,则写入 "yellow"。
如果 b9 在 4 到 7 之间,则写入 "white"。
如果 b9 >= 9,则写入 "green"
想要创建类似这样的东西:
my_df1 <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12),
NEW = c("white", "white", "white", "green", "white", "yellow", "yellow", "yellow", "green"))
我认为以下代码可以实现这个目标,但实际上没有成功。
greater_threshold <- 2
greater_threshold1 <- 4
greater_threshold2 <- 7
greater_threshold3 <- 9
my_df1 <- my_df %>%
mutate(NEW = case_when(b9 <= greater_threshold ~ "yellow", b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white", b9 >= greater_threshold3 ~ "green"))
希望这能帮助您解决问题。
英文:
I a df like this
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12))
I wanted to create a new column (NEW) based on the following assumptions.
If b9 is <= 2 write yellow.
If b9 is between 4 and 7 write white.
If b9 is >= 9 write green
The idea is to create something like this.
my_df1 <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12),
NEW = c("white", "white", "white", "green", "white", "yellow", "yellow", "yellow", "green"))
I thought something like this will do it, but it didn't.
greater_threshold <- 2
greater_threshold1 <- 4
greater_threshold2 <- 7
greater_threshold3 <- 9
my_df1 <- my_df %>%
mutate(NEW = case_when(b9 <= greater_threshold ~ "yellow", b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white", b9 >= greater_threshold3 ~ "green"))
Any help will be appreciated.
答案1
得分: 1
你的设置存在一些问题,例如,当b9等于3或8时,你希望得到的结果没有标签:
如果b9小于等于2,则写入黄色。如果b9介于4到7之间,则写入白色。如果b9大于等于9,则写入绿色。
我将更改“白色”条件,将其标签更改为b9介于3到7之间,将“绿色”更改为大于等于8,以保持逻辑清晰。然后,应该可以这样工作:
greater_threshold1 <- 2
greater_threshold2 <- 7
my_df <- mutate(my_df,
NEW = case_when(
b9 > greater_threshold2 ~ 'green',
b9 > greater_threshold1 ~ 'white',
TRUE ~ 'yellow'
)) %>% print()
输出:
b1 b2 b3 b4 b5 b6 b7 b8 b9 NEW
1 2 100 75 NA 2 9 1 NA 4 white
2 6 4 79 6 12 2 3 8 5 white
3 3 106 8 NA 1 4 7 4 7 white
4 6 102 0 10 7 6 7 5 9 green
5 4 6 2 12 8 7 4 1 5 white
6 2 6 3 8 5 6 2 4 1 yellow
7 1 1 9 3 5 6 2 1 1 yellow
8 9 1 5 6 6 7 9 3 2 yellow
9 NA 7 80 2 NA 9 5 6 12 green
case_when
在你列出条件列表时,从最严格的条件开始执行效果最好,然后逐渐变宽 - 它一旦找到与你的变量匹配的条件,就会执行赋值操作,所以你不必担心,例如,b9=12
会匹配前两个条件 - case_when
甚至不会执行第一个条件之后的操作。然后,你甚至不必定义最宽松的条件,只需使用 TRUE
,它基本上表示,如果你到了这一步,这是你剩下的赋值。
你的代码是倒过来的,所以例如 b9=12
会被分配为“白色”,因为它满足第二个条件,b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white"
,而赋值函数永远不会到达你真正想要的条件,即 b9 >= greater_threshold3 ~ "green"
。虽然你也可以得到你想要的结果,如果你将第二个条件中的OR操作符更改为AND,即 b9 >= greater_threshold1 & b9 <= greater_threshold2 ~ "white"
。然而,如果你使用我描述的从最严格到最宽松的条件方法,就不需要操作 &
,而且代码更清晰。
英文:
Your setup has some issues, for example what say you want would leave no label when b9 equals 3 or 8:
> If b9 is <= 2 write yellow. If b9 is between 4 and 7 write white. If
> b9 is >= 9 write green
I'm going to change the "white" condition to label b9 between 3 and 7, and "green" to >= 8, to maintain sanity. Then this should work:
greater_threshold1 <- 2
greater_threshold2 <- 7
my_df <- mutate(my_df,
NEW = case_when(
b9 > greater_threshold2 ~ 'green',
b9 > greater_threshold1 ~ 'white',
TRUE ~ 'yellow'
)) %>% print()
Output:
b1 b2 b3 b4 b5 b6 b7 b8 b9 NEW
1 2 100 75 NA 2 9 1 NA 4 white
2 6 4 79 6 12 2 3 8 5 white
3 3 106 8 NA 1 4 7 4 7 white
4 6 102 0 10 7 6 7 5 9 green
5 4 6 2 12 8 7 4 1 5 white
6 2 6 3 8 5 6 2 4 1 yellow
7 1 1 9 3 5 6 2 1 1 yellow
8 9 1 5 6 6 7 9 3 2 yellow
9 NA 7 80 2 NA 9 5 6 12 green
case_when
works best if you start with the narrowest condition, and then move wider as you go down the list of conditions - it executes the assignment operation as soon as it finds the condition that matches your variable, so you don't have to worry about the fact that, for example, b9=12
would match both of the first two conditions - case_when
won't even get past the first condition. Then you don't even have to define your most permissive condition, just use TRUE
which basically means, if you've gotten this far, here's the assignment you are left with.
Your code goes backwards, so e.g. b9=12
gets assigned "white" because it satisfies the second condition, b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white"
, and the assignment function never gets to the one you really want, b9 >= greater_threshold3 ~ "green"
. Although you can also get what you want, I think, if you change the OR operator in the second condition to AND, i.e. b9 >= greater_threshold1 & b9 <= greater_threshold2 ~ "white"
. If you use the narrow to wider condition approach that I describe, though, you don't need to mess with &
, plus you get cleaner code.
答案2
得分: 1
我们可以使用 cut()
来实现这个目标:
library(dplyr)
my_df %>%
mutate(NEW = cut(b9,
breaks = c(-Inf, 2, 4, 7, Inf),
labels = c("yellow", "white", "white", "green"),
include.lowest = TRUE))
结果如下:
b1 b2 b3 b4 b5 b6 b7 b8 b9 NEW
1 2 100 75 NA 2 9 1 NA 4 white
2 6 4 79 6 12 2 3 8 5 white
3 3 106 8 NA 1 4 7 4 7 white
4 6 102 0 10 7 6 7 5 9 green
5 4 6 2 12 8 7 4 1 5 white
6 2 6 3 8 5 6 2 4 1 yellow
7 1 1 9 3 5 6 2 1 1 yellow
8 9 1 5 6 6 7 9 3 2 yellow
9 NA 7 80 2 NA 9 5 6 12 green
英文:
We can use cut()
for this:
library(dplyr)
my_df %>%
mutate(NEW = cut(b9,
breaks = c(-Inf, 2, 4, 7, Inf),
labels = c("yellow", "white", "white", "green"),
include.lowest = TRUE))
b1 b2 b3 b4 b5 b6 b7 b8 b9 NEW
1 2 100 75 NA 2 9 1 NA 4 white
2 6 4 79 6 12 2 3 8 5 white
3 3 106 8 NA 1 4 7 4 7 white
4 6 102 0 10 7 6 7 5 9 green
5 4 6 2 12 8 7 4 1 5 white
6 2 6 3 8 5 6 2 4 1 yellow
7 1 1 9 3 5 6 2 1 1 yellow
8 9 1 5 6 6 7 9 3 2 yellow
9 NA 7 80 2 NA 9 5 6 12 green
答案3
得分: 0
你可以使用 dplyr
中的 between
函数:
my_df %>%
mutate(NEW = case_when(
b9 <= 2 ~ "Yellow",
between(b9, 4, 7) ~ "white",
b9 >= 9 ~ "green"
))
输出:
b1 b2 b3 b4 b5 b6 b7 b8 b9 NEW
1 2 100 75 NA 2 9 1 NA 4 white
2 6 4 79 6 12 2 3 8 5 white
3 3 106 8 NA 1 4 7 4 7 white
4 6 102 0 10 7 6 7 5 9 green
5 4 6 2 12 8 7 4 1 5 white
6 2 6 3 8 5 6 2 4 1 Yellow
7 1 1 9 3 5 6 2 1 1 Yellow
8 9 1 5 6 6 7 9 3 2 Yellow
9 NA 7 80 2 NA 9 5 6 12 green
不满足条件的值(例如,8)将被视为 NA
。
英文:
You can use between
from dplyr
:
my_df %>%
mutate(NEW = case_when(
b9 <= 2 ~ "Yellow",
between(b9, 4, 7) ~ "white",
b9 >= 9 ~ "green"
))
Output:
b1 b2 b3 b4 b5 b6 b7 b8 b9 NEW
1 2 100 75 NA 2 9 1 NA 4 white
2 6 4 79 6 12 2 3 8 5 white
3 3 106 8 NA 1 4 7 4 7 white
4 6 102 0 10 7 6 7 5 9 green
5 4 6 2 12 8 7 4 1 5 white
6 2 6 3 8 5 6 2 4 1 Yellow
7 1 1 9 3 5 6 2 1 1 Yellow
8 9 1 5 6 6 7 9 3 2 Yellow
9 NA 7 80 2 NA 9 5 6 12 green
Those not falling within the conditions (ie, 8) will be NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论