Perform a logical test over a group_by() of rows, but mutate the rows individually?

huangapple go评论132阅读模式
英文:

Perform a logical test over a group_by() of rows, but mutate the rows individually?

问题

如何在R中对一组行执行逻辑测试,但对单独的行执行输出?

我有两个物品,每个物品分为一年的4个季度。如果我在一年内只发货一次,我需要在该季度进行后续计算,但如果我在一年内没有发货,我需要显示“NA”,以便可以将其从这些计算中排除。

df <- data.frame(item = c(rep("A", 4), rep("B", 4)),
                 value = c(0, NA, 5, 1, 0, NA, NA, NA))

> df
  item value
1    A     0
2    A    NA
3    A     5
4    A     1
5    B     0
6    B    NA
7    B    NA
8    B    NA

期望的输出应该是这样的:

  item  value new_value
  <chr> <dbl>     <dbl>
1 A         0         0
2 A        NA         0
3 A         5         5
4 A         1         1
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

我尝试过的方法:

我通常使用library(tidyverse),但如果有基本解决方案,我不绝对要使用它。然而,我的第一个想法却出现了错误:

library(tidyverse)

df %>% 
  group_by(item) %>% 
  mutate(new_value = if_else(sum(value, na.rm = TRUE) > 0, coalesce(value, 0), NA_real_))

Error in `mutate()`:
! Problem while computing `new_value = if_else(sum(value, na.rm = TRUE) > 0, coalesce(value, 0), NA_real_)`.
ℹ The error occurred in group 1: item = "A".
Caused by error in `if_else()`:
! `true` must be length 1 (length of `condition`), not 4.

我只使用group_by()是为了让sum(x, na.rm = TRUE)作用于整个组,但然后我希望coalesce()作用于每个单独的行。我还知道dplyrif_else()与基本的ifelse()工作方式不同,因此我尝试了以下方法:

df %>% 
  group_by(item) %>% 
  mutate(new_value = ifelse(sum(value, na.rm = TRUE) > 0, coalesce(value, 0), NA_real_))

# A tibble: 8 × 3
# Groups:   item [2]
  item  value new_value
  <chr> <dbl>     <dbl>
1 A         0         0
2 A        NA         0
3 A         5         0   # 应该是 '5'
4 A         1         0   # 应该是 '1'
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

这更接近期望的输出,但它似乎只使用组中的第一行,而我希望操作分别在每个单独的行上进行,如上所示的期望输出。

我理解错了什么?

英文:

Question:

How do I perform a logical test over a group of rows, but have the output performed on individual rows in R?

Setup:

I have two items, each broken into 4 quarters of a year. If I shipped just 1 time during the year, I need to do follow-on calculations on that quarter, but if I have no shipments during the year, I need show NA so that I can exclude it from those calculations.

df &lt;- data.frame(item = c(rep(&quot;A&quot;, 4), rep(&quot;B&quot;, 4)),
                 value = c(0, NA, 5, 1, 0, NA, NA, NA))

&gt; df
  item value
1    A     0
2    A    NA
3    A     5
4    A     1
5    B     0
6    B    NA
7    B    NA
8    B    NA

Desired Output

The end result should be this:

  item  value new_value
  &lt;chr&gt; &lt;dbl&gt;     &lt;dbl&gt;
1 A         0         0
2 A        NA         0
3 A         5         5
4 A         1         1
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

What I Have Tried:

I'm generally using the library(tidyverse), but I'm not absolutely wedded to it if there's a base solution. My first thought, however, throws an error:

library(tidyverse)

df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = if_else(sum(value, na.rm = TRUE)&gt;0, coalesce(value, 0), NA_real_))

Error in `mutate()`:
! Problem while computing `new_value = if_else(sum(value, na.rm = TRUE) &gt; 0, coalesce(value, 0), NA_real_)`.
ℹ The error occurred in group 1: item = &quot;A&quot;.
Caused by error in `if_else()`:
! `true` must be length 1 (length of `condition`), not 4.

I'm only using the group_by() in order to have the sum(x, na.rm = TRUE) act over the group, but then I want the coalesce() to act on each individual row. I also know that dplyr's if_else() operates differently than the base ifelse(), so I tried that:

df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = ifelse(sum(value, na.rm = TRUE)&gt;0, coalesce(value, 0), NA_real_))

# A tibble: 8 &#215; 3
# Groups:   item [2]
  item  value new_value
  &lt;chr&gt; &lt;dbl&gt;     &lt;dbl&gt;
1 A         0         0
2 A        NA         0
3 A         5         0   # should be &#39;5&#39;
4 A         1         0   # should be &#39;1&#39;
5 B         0        NA
6 B        NA        NA
7 B        NA        NA
8 B        NA        NA

...which is closer to the desired output, but it appears to only use the first row in the group, and I want the action to be on each individual row, as you see in the Desired Output above.

What am I misunderstanding?

答案1

得分: 2

df %>%
group_by(item) %>%
mutate(new_val = coalesce(value,0) * any(value>0))

一个 tibble: 8 × 3

分组: item 2

item value new_val

1 A 0 0
2 A NA 0
3 A 5 5
4 A 1 1
5 B 0 NA
6 B NA NA
7 B NA NA
8 B NA NA

英文:

You could also use:

df %&gt;%
   group_by(item) %&gt;%
   mutate(new_val = coalesce(value,0) * any(value&gt;0))

# A tibble: 8 &#215; 3
# Groups:   item [2]
  item  value new_val
  &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt;
1 A         0       0
2 A        NA       0
3 A         5       5
4 A         1       1
5 B         0      NA
6 B        NA      NA
7 B        NA      NA
8 B        NA      NA

答案2

得分: 1

以下是翻译好的部分:

当我准备发布这个问题时,SO中的一个建议指向了这个类似的问题 [question.][1]

阅读 @akrun 的回答,我尝试将重复函数添加到逻辑测试中,并获得了我想要的结果:

```r
df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = ifelse(rep(sum(value, na.rm = TRUE)&gt;0,n()), coalesce(value, 0), NA_real_))

Perform a logical test over a group_by() of rows, but mutate the rows individually?

我非常好奇是否还有其他可能更好或者只是作为替代方案的解决方案,所以在我给出这个问题一些时间来沉淀之前,我不会接受这个答案。


<details>
<summary>英文:</summary>

# Possible Answer

As I got ready to post the question, one of the suggestions from SO pointed me to this similar  [question.][1]

Reading @akrun&#39;s answer, I tried adding the repetition function to the logical test, and got the result I want:

```r
df %&gt;% 
  group_by(item) %&gt;% 
  mutate(new_value = ifelse(rep(sum(value, na.rm = TRUE)&gt;0,n()), coalesce(value, 0), NA_real_))

Perform a logical test over a group_by() of rows, but mutate the rows individually?

I'm really curious if there are other solutions out there that might be better or just good to know as alternatives, so I'm not going to accept this answer until I've given this some time to marinade.

答案3

得分: 1

你应该使用 if/else 而不是 ifelse,因为每个组只有一个条件需要检查,而不是每一行都有一个条件。

library(dplyr)

df %>%
  mutate(value = if (any(value > 0, na.rm = TRUE)) pmax(value, 0, na.rm = TRUE) else NA, .by = item)
  item value
1    A     0
2    A     0
3    A     5
4    A     1
5    B    NA
6    B    NA
7    B    NA
8    B    NA
英文:

You want to use if/else instead of ifelse because there is a single condition per group to check rather than a condition per row.

library(dplyr)

df |&gt; 
  mutate(value = if (any(value &gt; 0, na.rm = TRUE)) pmax(value, 0, na.rm = TRUE) else NA, .by = item)

  item value
1    A     0
2    A     0
3    A     5
4    A     1
5    B    NA
6    B    NA
7    B    NA
8    B    NA

huangapple
  • 本文由 发表于 2023年3月12日 12:38:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75711071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定