英文:
Is there a way to split a grouped dataframe by size?
问题
table_by_size <- original_df %>% group_by(grouping_factor) %>% group_split()
给出了原始数据框中拆分数据的干净列表。
original_df %>% group_by(grouping_factor) %>% summarize(counts = n())
是一个非常简单的方法来统计所有分组的大小。
original_df %>% group_by(grouping_factor) %>% filter(n() == <size #>)
允许我筛选出特定大小的分组。
有没有一种方法可以将所有这些组合起来,根据组大小拆分分组的数据框?因此,将所有大小为n
的组作为一个项目拆分。将所有大小为3的组作为一个项目,将所有大小为4的组作为一个项目,依此类推。
一个简单的替代方法是使用循环遍历n()
并手动创建子框架,但在我这样做之前,我想看看是否有一种 tidyverse 方法可以做到这一点。
在dplyr
中几乎都可以实现这一点,但从外观上看,似乎无法定制group_split()
如何拆分数据,这是对的吗?
英文:
table_by_size <- original_df %>% group_by(grouping_factor) %>% group_split()
gives a nice, clean list_of
of the split data from original data.frame.
original_df %>% group_by(grouping_factor) %>% summarize(counts = n())
is a very easy way to count the size of all groups.
original_df %>% group_by(grouping_factor) %>% filter(n() == <size #>)
enables me to filter out groups of specific sizes.
Is there a way for me combine all of these and split a grouped data-frame based on group size? So, split all groups of size n
as one item. All groups of size 3 as one, all groups of size 4 as one and so forth.
A simple alternative is using a loop to go through n()
and make sub-frames manually but before I do that I wanted to see if there is a tidyverse method to do this.
It's all almost there in dplyr
but by the looks of it one cannot customize how group_split()
does the splitting, is this right?
答案1
得分: 2
将大小作为一列添加,并按此拆分:
metabolite_table |>
mutate(group_size = n(), .by = grouping_factor) |>
group_by(group_size) |>
group_split()
在这种情况下,名称有点令人困惑。假设grouping_factor
有一个更有意义的名称,我建议给group_size
一个类似有意义的名称。
英文:
Add the size as a column and split by that:
metabolite_table |>
mutate(group_size = n(), .by = grouping_factor) |>
group_by(group_size) |>
group_split()
The names are a bit confusing in this case. Assuming grouping_factor
has a more meaningful name, I'd suggest giving group_size
a similarly meaningful name.
答案2
得分: 2
如下所示:
例如,
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
mutate(GroupSize = n()) %>%
group_by(GroupSize) %>%
group_split(.keep = FALSE)
英文:
How about, for example,
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
mutate(GroupSize = n()) %>%
group_by(GroupSize) %>%
group_split(.keep = FALSE)
答案3
得分: 0
Sure, here is the translation of the code part:
original_data_frame |>
mutate(n = n(), .by = grouping_factor) |>
group_split(n)
"原始数据框 |>
mutate(n = n(), .by = 分组因素) |>
group_split(n)"
<details>
<summary>英文:</summary>
original_data_frame |>
mutate(n = n(), .by = grouping_factor) |>
group_split(n)
Also works.
[Credit to](https://www.reddit.com/r/Rlanguage/comments/15i45b4/comment/jus2jkv/?utm_source=share&utm_medium=web2x&context=3)
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论