2023年6月1日 22:08:46go评论96阅读模式

英文:

How to group variables that falls within a range of numbers

问题

以下是您要求的翻译内容：

我想要创建一个这样的数据框（DF）：
    my_df <- data.frame(
        b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
        b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
        b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
        b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
        b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
        b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
        b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
        b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
        b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12))
我想要基于以下假设创建一个新列（NEW）。
如果 b9 <= 2，则写入 "yellow"。
如果 b9 在 4 到 7 之间，则写入 "white"。
如果 b9 >= 9，则写入 "green"
想要创建类似这样的东西：
    my_df1 <- data.frame(
            b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
            b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
            b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
            b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
            b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
            b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
            b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
            b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
            b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12),
            NEW = c("white", "white", "white", "green", "white", "yellow", "yellow", "yellow", "green"))
我认为以下代码可以实现这个目标，但实际上没有成功。
    greater_threshold <- 2
    greater_threshold1 <- 4
    greater_threshold2 <- 7
    greater_threshold3 <- 9
    
    my_df1 <- my_df %>%
        mutate(NEW = case_when(b9 <= greater_threshold ~ "yellow", b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white", b9 >= greater_threshold3 ~ "green"))

希望这能帮助您解决问题。

英文:

I a df like this

my_df &lt;- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12))

I wanted to create a new column (NEW) based on the following assumptions.

If b9 is <= 2 write yellow.
If b9 is between 4 and 7 write white.
If b9 is >= 9 write green

The idea is to create something like this.

my_df1 &lt;- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA), 
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7), 
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80), 
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, 12),
NEW = c(&quot;white&quot;, &quot;white&quot;, &quot;white&quot;, &quot;green&quot;, &quot;white&quot;, &quot;yellow&quot;, &quot;yellow&quot;, &quot;yellow&quot;, &quot;green&quot;))

I thought something like this will do it, but it didn't.

greater_threshold &lt;- 2
greater_threshold1 &lt;- 4
greater_threshold2 &lt;- 7
greater_threshold3 &lt;- 9
my_df1 &lt;- my_df %&gt;%
mutate(NEW = case_when(b9 &lt;= greater_threshold ~ &quot;yellow&quot;, b9 &gt;= greater_threshold1 | b9 &lt;= greater_threshold2 ~ &quot;white&quot;, b9 &gt;= greater_threshold3 ~ &quot;green&quot;))

Any help will be appreciated.

答案1

得分: 1

你的设置存在一些问题，例如，当b9等于3或8时，你希望得到的结果没有标签：

如果b9小于等于2，则写入黄色。如果b9介于4到7之间，则写入白色。如果b9大于等于9，则写入绿色。

我将更改“白色”条件，将其标签更改为b9介于3到7之间，将“绿色”更改为大于等于8，以保持逻辑清晰。然后，应该可以这样工作：

greater_threshold1 <- 2
greater_threshold2 <- 7
my_df <- mutate(my_df,
                NEW = case_when(
                  b9 > greater_threshold2 ~ 'green',
                  b9 > greater_threshold1 ~ 'white',
                  TRUE ~ 'yellow'
                )) %>% print()

输出：

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

case_when 在你列出条件列表时，从最严格的条件开始执行效果最好，然后逐渐变宽 - 它一旦找到与你的变量匹配的条件，就会执行赋值操作，所以你不必担心，例如，b9=12会匹配前两个条件 - case_when 甚至不会执行第一个条件之后的操作。然后，你甚至不必定义最宽松的条件，只需使用 TRUE，它基本上表示，如果你到了这一步，这是你剩下的赋值。

你的代码是倒过来的，所以例如 b9=12 会被分配为“白色”，因为它满足第二个条件，b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white"，而赋值函数永远不会到达你真正想要的条件，即 b9 >= greater_threshold3 ~ "green"。虽然你也可以得到你想要的结果，如果你将第二个条件中的OR操作符更改为AND，即 b9 >= greater_threshold1 & b9 <= greater_threshold2 ~ "white"。然而，如果你使用我描述的从最严格到最宽松的条件方法，就不需要操作 &，而且代码更清晰。

英文:

Your setup has some issues, for example what say you want would leave no label when b9 equals 3 or 8:

> If b9 is <= 2 write yellow. If b9 is between 4 and 7 write white. If
> b9 is >= 9 write green

I'm going to change the "white" condition to label b9 between 3 and 7, and "green" to >= 8, to maintain sanity. Then this should work:

greater_threshold1 &lt;- 2
greater_threshold2 &lt;- 7
my_df &lt;- mutate(my_df,
NEW = case_when(
b9 &gt; greater_threshold2 ~ &#39;green&#39;,
b9 &gt; greater_threshold1 ~ &#39;white&#39;,
TRUE ~ &#39;yellow&#39;
)) %&gt;% print()

Output:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

case_when works best if you start with the narrowest condition, and then move wider as you go down the list of conditions - it executes the assignment operation as soon as it finds the condition that matches your variable, so you don't have to worry about the fact that, for example, b9=12 would match both of the first two conditions - case_when won't even get past the first condition. Then you don't even have to define your most permissive condition, just use TRUE which basically means, if you've gotten this far, here's the assignment you are left with.

Your code goes backwards, so e.g. b9=12 gets assigned "white" because it satisfies the second condition, b9 >= greater_threshold1 | b9 <= greater_threshold2 ~ "white", and the assignment function never gets to the one you really want, b9 >= greater_threshold3 ~ "green". Although you can also get what you want, I think, if you change the OR operator in the second condition to AND, i.e. b9 >= greater_threshold1 & b9 <= greater_threshold2 ~ "white". If you use the narrow to wider condition approach that I describe, though, you don't need to mess with &, plus you get cleaner code.

答案2

得分: 1

我们可以使用 cut() 来实现这个目标：

library(dplyr)
my_df %>%
  mutate(NEW = cut(b9, 
                   breaks = c(-Inf, 2, 4, 7, Inf),
                   labels = c("yellow", "white", "white", "green"),
                   include.lowest = TRUE))

结果如下：

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

英文:

We can use cut() for this:

library(dplyr)
my_df %&gt;%
mutate(NEW = cut(b9, 
breaks = c(-Inf, 2, 4, 7, Inf),
labels = c(&quot;yellow&quot;, &quot;white&quot;, &quot;white&quot;, &quot;green&quot;),
include.lowest = TRUE))
b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 yellow
7  1   1  9  3  5  6  2  1  1 yellow
8  9   1  5  6  6  7  9  3  2 yellow
9 NA   7 80  2 NA  9  5  6 12  green

答案3

得分: 0

你可以使用 dplyr 中的 between 函数：

my_df %>%
  mutate(NEW = case_when(
    b9 <= 2 ~ "Yellow",
    between(b9, 4, 7) ~ "white",
    b9 >= 9 ~ "green"
  ))

输出：

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 Yellow
7  1   1  9  3  5  6  2  1  1 Yellow
8  9   1  5  6  6  7  9  3  2 Yellow
9 NA   7 80  2 NA  9  5  6 12  green

不满足条件的值（例如，8）将被视为 NA。

英文:

You can use between from dplyr:

my_df %&gt;%
mutate(NEW = case_when(
b9 &lt;= 2 ~ &quot;Yellow&quot;,
between(b9, 4, 7) ~ &quot;white&quot;,
b9 &gt;= 9 ~ &quot;green&quot;
))

Output:

  b1  b2 b3 b4 b5 b6 b7 b8 b9    NEW
1  2 100 75 NA  2  9  1 NA  4  white
2  6   4 79  6 12  2  3  8  5  white
3  3 106  8 NA  1  4  7  4  7  white
4  6 102  0 10  7  6  7  5  9  green
5  4   6  2 12  8  7  4  1  5  white
6  2   6  3  8  5  6  2  4  1 Yellow
7  1   1  9  3  5  6  2  1  1 Yellow
8  9   1  5  6  6  7  9  3  2 Yellow
9 NA   7 80  2 NA  9  5  6 12  green

Those not falling within the conditions (ie, 8) will be NA

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何分组落在一定数值范围内的变量

问题

答案1

答案2

答案3

R: 在区间内计数观测值

fisher exact test for 2 consecutive rows in data frame R

Reduce函数如何处理列表变量？

无法遮蔽空间对象，即使将shapefile的CRS转换为与栅格的CRS匹配。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。