2023年3月12日 12:38:30go评论156阅读模式

英文:

Perform a logical test over a group_by() of rows, but mutate the rows individually?

问题

如何在R中对一组行执行逻辑测试，但对单独的行执行输出？

我有两个物品，每个物品分为一年的4个季度。如果我在一年内只发货一次，我需要在该季度进行后续计算，但如果我在一年内没有发货，我需要显示“NA”，以便可以将其从这些计算中排除。

df <- data.frame(item = c(rep("A", 4), rep("B", 4)),
                 value = c(0, NA, 5, 1, 0, NA, NA, NA))
> df
  item value
1    A     0
2    A    NA
3    A     5
4    A     1
5    B     0
6    B    NA
7    B    NA
8    B    NA

期望的输出应该是这样的：

  item  value new_value
  <chr> <dbl>     <dbl>
1 A         0         0
2 A        NA         0
3 A         5         5
4 A         1         1
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

我尝试过的方法：

我通常使用library(tidyverse)，但如果有基本解决方案，我不绝对要使用它。然而，我的第一个想法却出现了错误：

library(tidyverse)
df %>% 
  group_by(item) %>% 
  mutate(new_value = if_else(sum(value, na.rm = TRUE) > 0, coalesce(value, 0), NA_real_))
Error in `mutate()`:
! Problem while computing `new_value = if_else(sum(value, na.rm = TRUE) > 0, coalesce(value, 0), NA_real_)`.
ℹ The error occurred in group 1: item = "A".
Caused by error in `if_else()`:
! `true` must be length 1 (length of `condition`), not 4.

我只使用group_by()是为了让sum(x, na.rm = TRUE)作用于整个组，但然后我希望coalesce()作用于每个单独的行。我还知道dplyr的if_else()与基本的ifelse()工作方式不同，因此我尝试了以下方法：

df %>% 
  group_by(item) %>% 
  mutate(new_value = ifelse(sum(value, na.rm = TRUE) > 0, coalesce(value, 0), NA_real_))
# A tibble: 8 × 3
# Groups:   item [2]
  item  value new_value
  <chr> <dbl>     <dbl>
1 A         0         0
2 A        NA         0
3 A         5         0   # 应该是 '5'
4 A         1         0   # 应该是 '1'
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

这更接近期望的输出，但它似乎只使用组中的第一行，而我希望操作分别在每个单独的行上进行，如上所示的期望输出。

我理解错了什么？

英文:

Question:

How do I perform a logical test over a group of rows, but have the output performed on individual rows in R?

Setup:

I have two items, each broken into 4 quarters of a year. If I shipped just 1 time during the year, I need to do follow-on calculations on that quarter, but if I have no shipments during the year, I need show NA so that I can exclude it from those calculations.

df &lt;- data.frame(item = c(rep(&quot;A&quot;, 4), rep(&quot;B&quot;, 4)),
                 value = c(0, NA, 5, 1, 0, NA, NA, NA))
&gt; df
  item value
1    A     0
2    A    NA
3    A     5
4    A     1
5    B     0
6    B    NA
7    B    NA
8    B    NA

Desired Output

The end result should be this:

  item  value new_value
  &lt;chr&gt; &lt;dbl&gt;     &lt;dbl&gt;
1 A         0         0
2 A        NA         0
3 A         5         5
4 A         1         1
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

What I Have Tried:

I'm generally using the library(tidyverse), but I'm not absolutely wedded to it if there's a base solution. My first thought, however, throws an error:

library(tidyverse)
df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = if_else(sum(value, na.rm = TRUE)&gt;0, coalesce(value, 0), NA_real_))
Error in `mutate()`:
! Problem while computing `new_value = if_else(sum(value, na.rm = TRUE) &gt; 0, coalesce(value, 0), NA_real_)`.
ℹ The error occurred in group 1: item = &quot;A&quot;.
Caused by error in `if_else()`:
! `true` must be length 1 (length of `condition`), not 4.

I'm only using the group_by() in order to have the sum(x, na.rm = TRUE) act over the group, but then I want the coalesce() to act on each individual row. I also know that dplyr's if_else() operates differently than the base ifelse(), so I tried that:

df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = ifelse(sum(value, na.rm = TRUE)&gt;0, coalesce(value, 0), NA_real_))
# A tibble: 8 &#215; 3
# Groups:   item [2]
  item  value new_value
  &lt;chr&gt; &lt;dbl&gt;     &lt;dbl&gt;
1 A         0         0
2 A        NA         0
3 A         5         0   # should be &#39;5&#39;
4 A         1         0   # should be &#39;1&#39;
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

...which is closer to the desired output, but it appears to only use the first row in the group, and I want the action to be on each individual row, as you see in the Desired Output above.

What am I misunderstanding?

答案1

得分: 2

df %>%
group_by(item) %>%
mutate(new_val = coalesce(value,0) * any(value>0))

一个 tibble: 8 × 3

分组: item 2

item value new_val

1 A 0 0
2 A NA 0
3 A 5 5
4 A 1 1
5 B 0 NA
6 B NA NA
7 B NA NA
8 B NA NA

英文:

You could also use:

df %&gt;%
   group_by(item) %&gt;%
   mutate(new_val = coalesce(value,0) * any(value&gt;0))
# A tibble: 8 &#215; 3
# Groups:   item [2]
  item  value new_val
  &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt;
1 A         0       0
2 A        NA       0
3 A         5       5
4 A         1       1
5 B         0      NA
6 B        NA      NA
7 B        NA      NA
8 B        NA      NA

答案2

得分: 1

以下是翻译好的部分：

当我准备发布这个问题时，SO中的一个建议指向了这个类似的问题 [question.][1]
阅读 @akrun 的回答，我尝试将重复函数添加到逻辑测试中，并获得了我想要的结果：
```r
df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = ifelse(rep(sum(value, na.rm = TRUE)&gt;0,n()), coalesce(value, 0), NA_real_))

我非常好奇是否还有其他可能更好或者只是作为替代方案的解决方案，所以在我给出这个问题一些时间来沉淀之前，我不会接受这个答案。


<details>
<summary>英文:</summary>
# Possible Answer
As I got ready to post the question, one of the suggestions from SO pointed me to this similar  [question.][1]
Reading @akrun&#39;s answer, I tried adding the repetition function to the logical test, and got the result I want:
```r
df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = ifelse(rep(sum(value, na.rm = TRUE)&gt;0,n()), coalesce(value, 0), NA_real_))

I'm really curious if there are other solutions out there that might be better or just good to know as alternatives, so I'm not going to accept this answer until I've given this some time to marinade.

答案3

得分: 1

你应该使用 if/else 而不是 ifelse，因为每个组只有一个条件需要检查，而不是每一行都有一个条件。

library(dplyr)
df %>%
  mutate(value = if (any(value > 0, na.rm = TRUE)) pmax(value, 0, na.rm = TRUE) else NA, .by = item)

  item value
1    A     0
2    A     0
3    A     5
4    A     1
5    B    NA
6    B    NA
7    B    NA
8    B    NA

英文:

You want to use if/else instead of ifelse because there is a single condition per group to check rather than a condition per row.

library(dplyr)
df |&gt; 
  mutate(value = if (any(value &gt; 0, na.rm = TRUE)) pmax(value, 0, na.rm = TRUE) else NA, .by = item)
  item value
1    A     0
2    A     0
3    A     5
4    A     1
5    B    NA
6    B    NA
7    B    NA
8    B    NA

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Perform a logical test over a group_by() of rows, but mutate the rows individually?

问题

Question:

Setup:

Desired Output

What I Have Tried:

答案1

一个 tibble: 8 × 3

分组: item 2

答案2

答案3

计算多列上相同值的列之和，进行筛选。

Survminer – 排列多个 ggsurvplot 和 ggadjustedcurves

在R数据框中如何只分割列中的特定因子？

在R中，将列“Date”中的字符串模式替换为正确的格式。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。