2023年5月22日 21:46:10go评论66阅读模式

英文:

Check a column for a sequence of values using dplyr

问题

我希望查找我的分组数据框中的模式。我试图识别在信号列中连续出现值为1的三行序列，然后在信号列的第三行包含1时将警报列设置为TRUE。

当我要检查的滞后数较小时，我可以做到这一点，但如果我想扩展到查找30个连续的值，那么ifelse就变得难以管理了。

df <- data.frame(Group = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C"),
                 Signal = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1))

按组检查Signal列中包含1的连续三行

df <- df %>%
  group_by(Group) %>%
  mutate(Alert = ifelse(Signal == 1 & lag(Signal) == 1 & lag(Signal, 2) == 1, "Yes", "No"))

是否有更适用于较大滞后值的可扩展方法？

英文:

I wish to look for a pattern in my grouped data frame. I am trying to identify when there's been a sequence of 3 consecutive rows which contain the value 1 in the signal column and then to set the alert column to TRUE once the 3rd row of the signal column contains 1.

I can do this when the number of lags I wish to check is small but if I wanted to expand this to look for 30 consecutive values then the ifelse becomes unmanageable.

df &lt;- data.frame(Group = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;),
                 Signal = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1))

Per group check for three consecutive rows in the Signal column containing 1

df &lt;- df %&gt;%
  group_by(Group) %&gt;%
  mutate(Alert = ifelse(Signal == 1 &amp; lag(Signal) == 1 &amp; lag(Signal, 2) == 1, &quot;Yes&quot;, &quot;No&quot;))

Is there a more scalable approach for larger lag values?

答案1

得分: 3

你可以使用consecutive_id()函数来按照Signal列中当前的1s和0s运行分组，并使用row_number()函数来查看是否至少有n个1s。（当计算完成后，你当然可以删除consecutive_id(Signal)信号列。）

library(dplyr)
threshold = 3
df %>%
  group_by(Group, consecutive_id(Signal)) %>%
  mutate(
    Alert = ifelse(Signal == 1 & row_number() >= threshold, 1, 0)
  ) %>%
  ungroup()

# A tibble: 11 × 4

Group Signal `consecutive_id(Signal)` Alert

<chr> <dbl> <int> <dbl>

1 A 1 1 0

2 A 1 1 0

3 A 1 1 1

4 A 0 2 0

5 B 0 2 0

6 B 1 3 0

7 B 0 4 0

8 C 0 4 0

9 C 1 5 0

10 C 1 5 0

11 C 1 5 1


<details>
<summary>英文:</summary>

You can use `consecutive_id()` to group by the current run of 1s and 0s in the Signal column, and use `row_number()` to see if you&#39;ve had at least `n` 1s. (You can, of course, remove the `consecutive_id(Signal)` signal column after the calculation is done.)

library(dplyr)
threshold = 3
df |>
group_by(Group, consecutive_id(Signal)) |>
mutate(
Alert = ifelse(Signal == 1 & row_number() >= threshold, 1, 0)
) |>
ungroup()

# A tibble: 11 × 4

Group Signal `consecutive_id(Signal)` Alert

<chr> <dbl> <int> <dbl>

1 A 1 1 0

2 A 1 1 0

3 A 1 1 1

4 A 0 2 0

5 B 0 2 0

6 B 1 3 0

7 B 0 4 0

8 C 0 4 0

9 C 1 5 0

10 C 1 5 0

11 C 1 5 1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题

答案1

# A tibble: 11 × 4

Group Signal consecutive_id(Signal) Alert

<chr> <dbl> <int> <dbl>

1 A 1 1 0

2 A 1 1 0

3 A 1 1 1

4 A 0 2 0

5 B 0 2 0

6 B 1 3 0

7 B 0 4 0

8 C 0 4 0

9 C 1 5 0

10 C 1 5 0

11 C 1 5 1

# A tibble: 11 × 4

Group Signal consecutive_id(Signal) Alert

<chr> <dbl> <int> <dbl>

1 A 1 1 0

2 A 1 1 0

3 A 1 1 1

4 A 0 2 0

5 B 0 2 0

6 B 1 3 0

7 B 0 4 0

8 C 0 4 0

9 C 1 5 0

10 C 1 5 0

11 C 1 5 1

发表评论

Group Signal `consecutive_id(Signal)` Alert

Group Signal `consecutive_id(Signal)` Alert