英文:
Filter different number of rows per group
问题
我想要按照每个id
过滤x
行,但每个id
的x
值都不同。
示例数据集:
df <- data.frame(id = c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P2', 'P3', 'P3'),
points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))
这些数据和以下代码取自这里。
df %>%
group_by(id) %>%
filter(row_number() %in% c(1, 2))
这个代码会为每个id
筛选前两行。到目前为止一切正常。
但是我想要根据以下向量中存储的值,为每个id
筛选不同数量的行。
nrowtofilter <- c(3, 2, 1)
因此,我想要为P1
筛选3行,为P2
筛选2行,为P3
筛选1行。
但当我执行以下代码时:
df %>%
group_by(id) %>%
filter(row_number() %in% nrowtofilter)
我提取了每个ID
的前3行。
如何根据nrowtofilter
筛选id
呢?
英文:
I want to filter x
rows per id
, but the x
differs for each id
.
example dataset:
df <- data.frame(id = c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2',
'P2', 'P3', 'P3'),
points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))
The data and below code is adapted from here.
df %>%
group_by(id) %>%
filter(row_number() %in% c(1, 2))
This filters the first two rows for each id. So far so good.
But I want to filter different number of rows for each id based on the values stored in a vector like below
nrowtofilter <- c(3, 2, 1)
Thus, I want to filter 3 rows for P1
, 2 for P2
, and 1 for P3
.
But when I do
df %>%
group_by(id) %>%
filter(row_number() %in% nrowtofilter)
I extract the first 3 rows of each ID.
How can I filter id
s based on nrowtofilter
?
答案1
得分: 6
以下是您要翻译的代码部分:
library(dplyr)
df %>%
group_by(id) %>%
filter(row_number() <= nrowtofilter[cur_group_id()])
请注意,这是R语言中的一段代码,用于处理数据框。
英文:
A different approach with cur_group_id
, which doesn't require breaking the dataset into a list of dataframes:
library(dplyr)
df %>%
group_by(id) %>%
filter(row_number() <= nrowtofilter[cur_group_id()])
#> # A tibble: 6 x 2
#> # Groups: id [3]
#> id points
#> <chr> <dbl>
#> 1 P1 56
#> 2 P1 94
#> 3 P1 17
#> 4 P2 55
#> 5 P2 15
#> 6 P3 55
答案2
得分: 2
你可以使用split
函数分组,然后使用map2
(或mapply
)函数来按组和n值截取前几行:
library(dplyr)
nrowtofilter <- c(3, 2, 1)
df %>%
group_split(id) %>%
map2(nrowtofilter, ~ slice_head(.x, n = .y)) %>%
bind_rows()
输出结果:
# A tibble: 6 × 2
id points
1 P1 56
2 P1 94
3 P1 17
4 P2 55
5 P2 15
6 P3 55
在基本R中,使用相同的逻辑:
split(df, df$id) |
Map(f = function(x, y) head(x, y), y = nrowtofilter) |
do.call(what = "rbind")
英文:
You could split
the groups, and use map2
(or mapply
) to slice
the first rows over groups and n:
library(dplyr)
nrowtofilter <- c(3, 2, 1)
df %>%
group_split(id) %>%
map2(nrowtofilter, ~ slice_head(.x, n = .y)) %>%
bind_rows()
output
# A tibble: 6 × 2
id points
<chr> <dbl>
1 P1 56
2 P1 94
3 P1 17
4 P2 55
5 P2 15
6 P3 55
In base R, with the same logic:
split(df, df$id) |>
Map(f = function(x, y) head(x, y), y = nrowtofilter) |>
do.call(what = "rbind")
答案3
得分: 2
首先创建一个查找表:
nrowtofilter <- setNames(c(3, 2, 1), c('P1', 'P2', 'P3'))
# P1 P2 P3
# 3 2 1
然后使用 group_modify()
函数:
library(dplyr)
df %>%
group_by(id) %>%
group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %>%
ungroup()
# # A tibble: 6 × 2
# id points
# <chr> <dbl>
# 1 P1 56
# 2 P1 94
# 3 P1 17
# 4 P2 55
# 5 P2 15
# 6 P3 55
其中,.x
指代给定组的行子集,而 .y
是一个包含每个分组变量的一列的一行 tibble,用于标识分组。
英文:
First create a lookup table:
nrowtofilter <- setNames(c(3, 2, 1), c('P1', 'P2', 'P3'))
# P1 P2 P3
# 3 2 1
then group_modify()
:
library(dplyr)
df %>%
group_by(id) %>%
group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %>%
ungroup()
# # A tibble: 6 × 2
# id points
# <chr> <dbl>
# 1 P1 56
# 2 P1 94
# 3 P1 17
# 4 P2 55
# 5 P2 15
# 6 P3 55
where .x
refers to the subset of rows for the given group, and .y
a one-row tibble with one column per grouping variable that identifies the group.
答案4
得分: 1
另一种选项
library(dplyr)
tibble(id = unique(df$id), nrowtofilter) %>%
left_join(df, .) %>%
filter(row_number() <= first(nrowtofilter), .by = 'id') %>%
select(-nrowtofilter)
- 输出
id points
1 P1 56
2 P1 94
3 P1 17
4 P2 55
5 P2 15
6 P3 55
英文:
Another option
library(dplyr)
tibble(id = unique(df$id), nrowtofilter) %>%
left_join(df, .) %>%
filter(row_number() <= first(nrowtofilter), .by = 'id') %>%
select(-nrowtofilter)
-output
id points
1 P1 56
2 P1 94
3 P1 17
4 P2 55
5 P2 15
6 P3 55
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论