有没有一种方法可以按大小拆分分组的数据框?

huangapple go评论81阅读模式
英文:

Is there a way to split a grouped dataframe by size?

问题

table_by_size <- original_df %>% group_by(grouping_factor) %>% group_split() 给出了原始数据框中拆分数据的干净列表。

original_df %>% group_by(grouping_factor) %>% summarize(counts = n()) 是一个非常简单的方法来统计所有分组的大小。

original_df %>% group_by(grouping_factor) %>% filter(n() == <size #>) 允许我筛选出特定大小的分组。

有没有一种方法可以将所有这些组合起来,根据组大小拆分分组的数据框?因此,将所有大小为n的组作为一个项目拆分。将所有大小为3的组作为一个项目,将所有大小为4的组作为一个项目,依此类推。

一个简单的替代方法是使用循环遍历n()并手动创建子框架,但在我这样做之前,我想看看是否有一种 tidyverse 方法可以做到这一点。

dplyr中几乎都可以实现这一点,但从外观上看,似乎无法定制group_split()如何拆分数据,这是对的吗?

英文:

table_by_size &lt;- original_df %&gt;% group_by(grouping_factor) %&gt;% group_split() gives a nice, clean list_of of the split data from original data.frame.

original_df %&gt;% group_by(grouping_factor) %&gt;% summarize(counts = n()) is a very easy way to count the size of all groups.

original_df %&gt;% group_by(grouping_factor) %&gt;% filter(n() == &lt;size #&gt;) enables me to filter out groups of specific sizes.

Is there a way for me combine all of these and split a grouped data-frame based on group size? So, split all groups of size n as one item. All groups of size 3 as one, all groups of size 4 as one and so forth.

A simple alternative is using a loop to go through n() and make sub-frames manually but before I do that I wanted to see if there is a tidyverse method to do this.

It's all almost there in dplyr but by the looks of it one cannot customize how group_split() does the splitting, is this right?

答案1

得分: 2

将大小作为一列添加,并按此拆分:

metabolite_table |&gt;
  mutate(group_size = n(), .by = grouping_factor) |&gt;
  group_by(group_size) |&gt;
  group_split()

在这种情况下,名称有点令人困惑。假设grouping_factor有一个更有意义的名称,我建议给group_size一个类似有意义的名称。

英文:

Add the size as a column and split by that:

metabolite_table |&gt;
  mutate(group_size = n(), .by = grouping_factor) |&gt;
  group_by(group_size) |&gt;
  group_split()

The names are a bit confusing in this case. Assuming grouping_factor has a more meaningful name, I'd suggest giving group_size a similarly meaningful name.

答案2

得分: 2

如下所示:

例如,

library(tidyverse)

mtcars %&gt;% 
  group_by(cyl) %&gt;% 
  mutate(GroupSize = n()) %&gt;% 
  group_by(GroupSize) %&gt;% 
  group_split(.keep = FALSE)
英文:

How about, for example,

library(tidyverse)

mtcars %&gt;% 
  group_by(cyl) %&gt;% 
  mutate(GroupSize = n()) %&gt;% 
  group_by(GroupSize) %&gt;% 
  group_split(.keep = FALSE)

答案3

得分: 0

Sure, here is the translation of the code part:

original_data_frame |&gt; 
  mutate(n = n(), .by = grouping_factor) |&gt; 
  group_split(n)

"原始数据框 |>
mutate(n = n(), .by = 分组因素) |>
group_split(n)"


<details>
<summary>英文:</summary>

original_data_frame |>
mutate(n = n(), .by = grouping_factor) |>
group_split(n)


Also works.

[Credit to](https://www.reddit.com/r/Rlanguage/comments/15i45b4/comment/jus2jkv/?utm_source=share&amp;utm_medium=web2x&amp;context=3)

</details>



huangapple
  • 本文由 发表于 2023年8月5日 00:06:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76837573.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定