2023年2月23日 23:32:30go评论88阅读模式

英文:

Filter different number of rows per group

问题

我想要按照每个id过滤x行，但每个id的x值都不同。

示例数据集：

df <- data.frame(id = c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P2', 'P3', 'P3'),
           points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))

这些数据和以下代码取自这里。

df %>%
  group_by(id) %>%
  filter(row_number() %in% c(1, 2))

这个代码会为每个id筛选前两行。到目前为止一切正常。

但是我想要根据以下向量中存储的值，为每个id筛选不同数量的行。

nrowtofilter <- c(3, 2, 1)

因此，我想要为P1筛选3行，为P2筛选2行，为P3筛选1行。

但当我执行以下代码时：

df %>%
  group_by(id) %>%
  filter(row_number() %in% nrowtofilter)

我提取了每个ID的前3行。

如何根据nrowtofilter筛选id呢？

英文:

I want to filter x rows per id, but the x differs for each id.

example dataset:

df &lt;- data.frame(id = c(&#39;P1&#39;, &#39;P1&#39;, &#39;P1&#39;, &#39;P1&#39;, &#39;P2&#39;, &#39;P2&#39;, &#39;P2&#39;, 
   &#39;P2&#39;, &#39;P3&#39;, &#39;P3&#39;),
           points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))

The data and below code is adapted from here.

df %&gt;%
  group_by(id) %&gt;%
  filter(row_number() %in% c(1, 2))

This filters the first two rows for each id. So far so good.

But I want to filter different number of rows for each id based on the values stored in a vector like below

nrowtofilter &lt;- c(3, 2, 1)

Thus, I want to filter 3 rows for P1, 2 for P2, and 1 for P3.

But when I do

df %&gt;%
  group_by(id) %&gt;%
  filter(row_number() %in% nrowtofilter)

I extract the first 3 rows of each ID.

How can I filter ids based on nrowtofilter?

答案1

得分: 6

以下是您要翻译的代码部分：

library(dplyr)

df %>%
  group_by(id) %>%
  filter(row_number() <= nrowtofilter[cur_group_id()])

请注意，这是R语言中的一段代码，用于处理数据框。

英文:

A different approach with cur_group_id, which doesn't require breaking the dataset into a list of dataframes:

library(dplyr)

df %&gt;% 
  group_by(id) %&gt;% 
  filter(row_number() &lt;= nrowtofilter[cur_group_id()])

#&gt; # A tibble: 6 x 2
#&gt; # Groups:   id [3]
#&gt;   id    points
#&gt;   &lt;chr&gt;  &lt;dbl&gt;
#&gt; 1 P1        56
#&gt; 2 P1        94
#&gt; 3 P1        17
#&gt; 4 P2        55
#&gt; 5 P2        15
#&gt; 6 P3        55

答案2

得分: 2

你可以使用split函数分组，然后使用map2（或mapply）函数来按组和n值截取前几行：

library(dplyr)
nrowtofilter <- c(3, 2, 1)
df %>%
  group_split(id) %>%
  map2(nrowtofilter, ~ slice_head(.x, n = .y)) %>%
  bind_rows()

输出结果：

# A tibble: 6 × 2
  id    points
1 P1        56
2 P1        94
3 P1        17
4 P2        55
5 P2        15
6 P3        55

在基本R中，使用相同的逻辑：

split(df, df$id) |
  Map(f = function(x, y) head(x, y), y = nrowtofilter) |
  do.call(what = "rbind")

英文:

You could split the groups, and use map2 (or mapply) to slice the first rows over groups and n:

library(dplyr)
nrowtofilter &lt;- c(3, 2, 1)
df %&gt;% 
  group_split(id) %&gt;% 
  map2(nrowtofilter, ~ slice_head(.x, n = .y)) %&gt;% 
  bind_rows()

output

# A tibble: 6 &#215; 2
  id    points
  &lt;chr&gt;  &lt;dbl&gt;
1 P1        56
2 P1        94
3 P1        17
4 P2        55
5 P2        15
6 P3        55

In base R, with the same logic:

split(df, df$id) |&gt;
  Map(f = function(x, y) head(x, y), y = nrowtofilter) |&gt;
  do.call(what = &quot;rbind&quot;)

答案3

得分: 2

首先创建一个查找表：

nrowtofilter <- setNames(c(3, 2, 1), c('P1', 'P2', 'P3'))
# P1 P2 P3 
#  3  2  1

然后使用 group_modify() 函数：

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %>%
  ungroup()

# # A tibble: 6 × 2
#   id    points
#   <chr>  <dbl>
# 1 P1        56
# 2 P1        94
# 3 P1        17
# 4 P2        55
# 5 P2        15
# 6 P3        55

其中，.x 指代给定组的行子集，而 .y 是一个包含每个分组变量的一列的一行 tibble，用于标识分组。

英文:

First create a lookup table:

nrowtofilter &lt;- setNames(c(3, 2, 1), c(&#39;P1&#39;, &#39;P2&#39;, &#39;P3&#39;))
# P1 P2 P3 
#  3  2  1

then group_modify():

library(dplyr)

df %&gt;%
  group_by(id) %&gt;%
  group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %&gt;%
  ungroup()

# # A tibble: 6 &#215; 2
#   id    points
#   &lt;chr&gt;  &lt;dbl&gt;
# 1 P1        56
# 2 P1        94
# 3 P1        17
# 4 P2        55
# 5 P2        15
# 6 P3        55

where .x refers to the subset of rows for the given group, and .y a one-row tibble with one column per grouping variable that identifies the group.

答案4

得分: 1

另一种选项

library(dplyr)
tibble(id = unique(df$id), nrowtofilter) %>% 
  left_join(df, .) %>%
  filter(row_number() <= first(nrowtofilter), .by = 'id') %>% 
  select(-nrowtofilter)

输出

  id points
1 P1     56
2 P1     94
3 P1     17
4 P2     55
5 P2     15
6 P3     55

英文:

Another option

library(dplyr)
tibble(id = unique(df$id), nrowtofilter) %&gt;% 
  left_join(df, .) %&gt;%
  filter(row_number() &lt;= first(nrowtofilter), .by = &#39;id&#39;) %&gt;% 
  select(-nrowtofilter)

-output

  id points
1 P1     56
2 P1     94
3 P1     17
4 P2     55
5 P2     15
6 P3     55

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按组筛选不同数量的行。

问题

答案1

答案2

答案3

答案4

echarts4r在Quarto中无法渲染。

Polars分组洗牌和拆分数据框。

在基本的R中，反转绘制的GAM中的X和Y轴：

ggplot2 geom_text 在 Linux 上的字体大小

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论