2023年8月5日 00:06:48go评论118阅读模式

英文:

Is there a way to split a grouped dataframe by size?

问题

table_by_size <- original_df %>% group_by(grouping_factor) %>% group_split() 给出了原始数据框中拆分数据的干净列表。

original_df %>% group_by(grouping_factor) %>% summarize(counts = n()) 是一个非常简单的方法来统计所有分组的大小。

original_df %>% group_by(grouping_factor) %>% filter(n() == <size #>) 允许我筛选出特定大小的分组。

有没有一种方法可以将所有这些组合起来，根据组大小拆分分组的数据框？因此，将所有大小为n的组作为一个项目拆分。将所有大小为3的组作为一个项目，将所有大小为4的组作为一个项目，依此类推。

一个简单的替代方法是使用循环遍历n()并手动创建子框架，但在我这样做之前，我想看看是否有一种 tidyverse 方法可以做到这一点。

在dplyr中几乎都可以实现这一点，但从外观上看，似乎无法定制group_split()如何拆分数据，这是对的吗？

英文:

table_by_size <- original_df %>% group_by(grouping_factor) %>% group_split() gives a nice, clean list_of of the split data from original data.frame.

original_df %>% group_by(grouping_factor) %>% summarize(counts = n()) is a very easy way to count the size of all groups.

original_df %>% group_by(grouping_factor) %>% filter(n() == <size #>) enables me to filter out groups of specific sizes.

Is there a way for me combine all of these and split a grouped data-frame based on group size? So, split all groups of size n as one item. All groups of size 3 as one, all groups of size 4 as one and so forth.

A simple alternative is using a loop to go through n() and make sub-frames manually but before I do that I wanted to see if there is a tidyverse method to do this.

It's all almost there in dplyr but by the looks of it one cannot customize how group_split() does the splitting, is this right?

答案1

得分: 2

将大小作为一列添加，并按此拆分：

metabolite_table |&gt;
  mutate(group_size = n(), .by = grouping_factor) |&gt;
  group_by(group_size) |&gt;
  group_split()

在这种情况下，名称有点令人困惑。假设grouping_factor有一个更有意义的名称，我建议给group_size一个类似有意义的名称。

英文:

Add the size as a column and split by that:

metabolite_table |&gt;
  mutate(group_size = n(), .by = grouping_factor) |&gt;
  group_by(group_size) |&gt;
  group_split()

The names are a bit confusing in this case. Assuming grouping_factor has a more meaningful name, I'd suggest giving group_size a similarly meaningful name.

答案2

得分: 2

如下所示：

例如，
library(tidyverse)
mtcars %&gt;% 
  group_by(cyl) %&gt;% 
  mutate(GroupSize = n()) %&gt;% 
  group_by(GroupSize) %&gt;% 
  group_split(.keep = FALSE)

英文:

How about, for example,

library(tidyverse)
mtcars %&gt;% 
  group_by(cyl) %&gt;% 
  mutate(GroupSize = n()) %&gt;% 
  group_by(GroupSize) %&gt;% 
  group_split(.keep = FALSE)

答案3

得分: 0

Sure, here is the translation of the code part:

original_data_frame |&gt; 
  mutate(n = n(), .by = grouping_factor) |&gt; 
  group_split(n)

"原始数据框 |>
mutate(n = n(), .by = 分组因素) |>
group_split(n)"


<details>
<summary>英文:</summary>

original_data_frame |>
mutate(n = n(), .by = grouping_factor) |>
group_split(n)


Also works.
[Credit to](https://www.reddit.com/r/Rlanguage/comments/15i45b4/comment/jus2jkv/?utm_source=share&amp;utm_medium=web2x&amp;context=3)
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有一种方法可以按大小拆分分组的数据框？

问题

答案1

答案2

答案3

左连接并保留唯一值

每个多边形覆盖的单元格面积

提取变量名到一列并创建长格式数据

在基于列的ggplot中显示有序分组面板上的标签。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。