英文:
Check a column for a sequence of values using dplyr
问题
我希望查找我的分组数据框中的模式。我试图识别在信号列中连续出现值为1的三行序列,然后在信号列的第三行包含1时将警报列设置为TRUE。
当我要检查的滞后数较小时,我可以做到这一点,但如果我想扩展到查找30个连续的值,那么ifelse就变得难以管理了。
df <- data.frame(Group = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C"),
Signal = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1))
按组检查Signal列中包含1的连续三行
df <- df %>%
group_by(Group) %>%
mutate(Alert = ifelse(Signal == 1 & lag(Signal) == 1 & lag(Signal, 2) == 1, "Yes", "No"))
是否有更适用于较大滞后值的可扩展方法?
英文:
I wish to look for a pattern in my grouped data frame. I am trying to identify when there's been a sequence of 3 consecutive rows which contain the value 1 in the signal column and then to set the alert column to TRUE once the 3rd row of the signal column contains 1.
I can do this when the number of lags I wish to check is small but if I wanted to expand this to look for 30 consecutive values then the ifelse becomes unmanageable.
df <- data.frame(Group = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C"),
Signal = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1))
Per group check for three consecutive rows in the Signal column containing 1
df <- df %>%
group_by(Group) %>%
mutate(Alert = ifelse(Signal == 1 & lag(Signal) == 1 & lag(Signal, 2) == 1, "Yes", "No"))
Is there a more scalable approach for larger lag values?
答案1
得分: 3
你可以使用consecutive_id()
函数来按照Signal列中当前的1s和0s运行分组,并使用row_number()
函数来查看是否至少有n
个1s。(当计算完成后,你当然可以删除consecutive_id(Signal)
信号列。)
library(dplyr)
threshold = 3
df %>%
group_by(Group, consecutive_id(Signal)) %>%
mutate(
Alert = ifelse(Signal == 1 & row_number() >= threshold, 1, 0)
) %>%
ungroup()
# A tibble: 11 × 4
Group Signal consecutive_id(Signal)
Alert
<chr> <dbl> <int> <dbl>
1 A 1 1 0
2 A 1 1 0
3 A 1 1 1
4 A 0 2 0
5 B 0 2 0
6 B 1 3 0
7 B 0 4 0
8 C 0 4 0
9 C 1 5 0
10 C 1 5 0
11 C 1 5 1
<details>
<summary>英文:</summary>
You can use `consecutive_id()` to group by the current run of 1s and 0s in the Signal column, and use `row_number()` to see if you've had at least `n` 1s. (You can, of course, remove the `consecutive_id(Signal)` signal column after the calculation is done.)
library(dplyr)
threshold = 3
df |>
group_by(Group, consecutive_id(Signal)) |>
mutate(
Alert = ifelse(Signal == 1 & row_number() >= threshold, 1, 0)
) |>
ungroup()
# A tibble: 11 × 4
Group Signal consecutive_id(Signal)
Alert
<chr> <dbl> <int> <dbl>
1 A 1 1 0
2 A 1 1 0
3 A 1 1 1
4 A 0 2 0
5 B 0 2 0
6 B 1 3 0
7 B 0 4 0
8 C 0 4 0
9 C 1 5 0
10 C 1 5 0
11 C 1 5 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论