英文:
Filter by value counts within groups
问题
我想根据每个分组中特定值的出现次数来筛选我的分组数据框。
一些示例数据:
data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T),
rt = runif(100,0.2,1),
lapse = sample(1:2,100,replace = T))
在这种情况下,“lapse”列是我的筛选变量。
我想要排除每个“ID”组中“lapse”等于2的计数超过15次的情况!
data %>% group_by(ID) %>% count(lapse == 2)
所以,例如,如果组“A”中有17次“lapse”等于2,那么整个数据框将被过滤掉。
英文:
I want to filter my grouped dataframe based on the number of occurrences of a specific value within a group.
Some exemplary data:
data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T),
rt = runif(100,0.2,1),
lapse = sample(1:2,100,replace = T))
The “lapse” column is my filter variable in this case.
I want to exclude every “ID” group that has more than 15 counts of “lapse” == 2 within!
data %>% group_by(ID) %>% count(lapse == 2)
So, if for example the group “A” has 17 times “lapse” == 2 within it should be filtered entirely from the datafame.
答案1
得分: 3
以下是您要翻译的内容:
"First I created some reproducible data using a set.seed
and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter
and sum
the values with lapse 2 per group like this:
set.seed(7)
data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T),
rt = runif(100,0.2,1),
lapse = sample(1:2,100,replace = T))
library(dplyr)
# Check n values per group
data %>%
group_by(ID, lapse) %>%
summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups: ID [4]
#> ID lapse n
#> <chr> <int> <int>
#> 1 A 1 8
#> 2 A 2 7
#> 3 B 1 13
#> 4 B 2 15
#> 5 C 1 18
#> 6 C 2 6
#> 7 D 1 17
#> 8 D 2 16
data %>%
group_by(ID) %>%
filter(!(sum(lapse == 2) > 15))
#> # A tibble: 67 × 3
#> # Groups: ID [3]
#> ID rt lapse
#> <chr> <dbl> <int>
#> 1 B 0.517 2
#> 2 C 0.589 1
#> 3 C 0.598 2
#> 4 C 0.715 1
#> 5 B 0.475 2
#> 6 C 0.965 1
#> 7 B 0.234 1
#> 8 B 0.812 2
#> 9 C 0.517 1
#> 10 B 0.700 1
#> # … with 57 more rows
<sup>Created on 2023-01-08 with reprex v2.0.2</sup>"
英文:
First I created some reproducible data using a set.seed
and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter
and sum
the values with lapse 2 per group like this:
set.seed(7)
data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T),
rt = runif(100,0.2,1),
lapse = sample(1:2,100,replace = T))
library(dplyr)
# Check n values per group
data %>%
group_by(ID, lapse) %>%
summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups: ID [4]
#> ID lapse n
#> <chr> <int> <int>
#> 1 A 1 8
#> 2 A 2 7
#> 3 B 1 13
#> 4 B 2 15
#> 5 C 1 18
#> 6 C 2 6
#> 7 D 1 17
#> 8 D 2 16
data %>%
group_by(ID) %>%
filter(!(sum(lapse == 2) > 15))
#> # A tibble: 67 × 3
#> # Groups: ID [3]
#> ID rt lapse
#> <chr> <dbl> <int>
#> 1 B 0.517 2
#> 2 C 0.589 1
#> 3 C 0.598 2
#> 4 C 0.715 1
#> 5 B 0.475 2
#> 6 C 0.965 1
#> 7 B 0.234 1
#> 8 B 0.812 2
#> 9 C 0.517 1
#> 10 B 0.700 1
#> # … with 57 more rows
<sup>Created on 2023-01-08 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论