我正在寻找一个更短的函数来从列表中分组相似的数据集。

huangapple go评论58阅读模式
英文:

I am looking for a shorter function to group similar datasets from a list

问题

我有一个分散的数据库列表。我想要将相同年份的数据库分组;

mydata12<-data.frame(Age=c(12,13),Sex=c("F","H"), Weight=c(70,75),year=c(2012))
mydata13<-data.frame(Age=c(14,15),Sex=c("F","H"), Weight=c(70,75),year=c(2013))
mydata14<-data.frame(Age=c(16,17),Sex=c("F","H"), Weight=c(70,75),year=c(2014))
mydata2012<-data.frame(Age=c(18,19),Sex=c("F","H"), Weight=c(70,75),year=c(2012))
mydata2013<-data.frame(Age=c(20,13),Sex=c("H","H"), Weight=c(70,75),year=c(2013))
mydata2014<-data.frame(Age=c(22,13),Sex=c("F","F"), Weight=c(70,75),year=c(2014))

mydatalist<-list(

  `12`=mydata12,
  `13`=mydata13,
  `14`=mydata14,
  `2013`=mydata2012,
  `2014`=mydata2013,
  `2015`=mydata2014
)

你可以使用以下函数来完成:

list(`2012`=rbind(mydatalist$`12`,mydata2012),
     `2013`=rbind(mydatalist$`13`,mydata2013),
     `2014`=rbind(mydatalist$`14`,mydata2014))

但我想要让它更简洁(不需要为每一年编写一行代码),因为我们已经有模式 2012:2021,可以这样做:

12:21
英文:

I have scattered databases in a list. I would like to group databases of the same year;

mydata12&lt;-data.frame(Age=c(12,13),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2012))
mydata13&lt;-data.frame(Age=c(14,15),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2013))
mydata14&lt;-data.frame(Age=c(16,17),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2014))
mydata2012&lt;-data.frame(Age=c(18,19),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2012))
mydata2013&lt;-data.frame(Age=c(20,13),Sex=c(&quot;H&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2013))
mydata2014&lt;-data.frame(Age=c(22,13),Sex=c(&quot;F&quot;,&quot;F&quot;), Weight=c(70,75),year=c(2014))



 mydatalist&lt;-list(

  `12`=mydata12,
  `13`=mydata13,
  `14`=mydata14,
  `2013`=mydata2012,
  `2014`=mydata2013,
  `2015`=mydata2014
)

I can do it with this function

list(`2012`=rbind(mydatalist$`12`,mydata2012),
     `2013`=rbind(mydatalist$`13`,mydata2013),
     `2014`=rbind(mydatalist$`14`,mydata2014))

but I would like to make it shorter (without a line of code for each year), since we already have patterns 2012:2021,

12:21

答案1

得分: 3

We could use bind_rows with group_split:

library(dplyr)

bind_rows(mydatalist) %>%
  split(f = as.factor(.$year))
$`2012`
  Age Sex Weight year
1  12   F     70 2012
2  13   H     75 2012
7  18   F     70 2012
8  19   H     75 2012

$`2013`
   Age Sex Weight year
3   14   F     70 2013
4   15   H     75 2013
9   20   H     70 2013
10  13   H     75 2013

$`2014`
   Age Sex Weight year
5   16   F     70 2014
6   17   H     75 2014
11  22   F     70 2014
12  13   F     75 2014
英文:

We could use bind_rows with group_split:

library(dplyr)

bind_rows(mydatalist) %&gt;% 
  split(f = as.factor(.$year))

$`2012`
  Age Sex Weight year
1  12   F     70 2012
2  13   H     75 2012
7  18   F     70 2012
8  19   H     75 2012

$`2013`
   Age Sex Weight year
3   14   F     70 2013
4   15   H     75 2013
9   20   H     70 2013
10  13   H     75 2013

$`2014`
   Age Sex Weight year
5   16   F     70 2014
6   17   H     75 2014
11  22   F     70 2014
12  13   F     75 2014

答案2

得分: 3

In base R, 使用以下方式之一来处理列表名称的子串,即最后两位数字,或者在列表名称仅包含2个字符的情况下添加20作为前缀,然后执行splitrbind

out <- lapply(split(mydatalist, sub("^(\\d{2})$", "20\", names(mydatalist))), \(x) `row.names<-`(do.call(rbind, x), NULL))

-output

> out
$`2012`
  Age Sex Weight year
1  12   F     70 2012
2  13   H     75 2012
3  18   F     70 2012
4  19   H     75 2012

$`2013`
  Age Sex Weight year
1  14   F     70 2013
2  15   H     75 2013
3  20   H     70 2013
4  13   H     75 2013

$`2014`
  Age Sex Weight year
1  16   F     70 2014
2  17   H     75 2014
3  22   F     70 2014
4  13   F     75 2014
英文:

In base R, use either the substring of the list names i.e. the last 2 digits or add 20 as prefix to those have only 2 characters in the list name, then split the list and rbind

out &lt;- lapply(split(mydatalist, sub(&quot;^(\\d{2})$&quot;, &quot;20\&quot;, 
  names(mydatalist))), \(x) `row.names&lt;-`(do.call(rbind, x), NULL))

-output

&gt; out
$`2012`
  Age Sex Weight year
1  12   F     70 2012
2  13   H     75 2012
3  18   F     70 2012
4  19   H     75 2012

$`2013`
  Age Sex Weight year
1  14   F     70 2013
2  15   H     75 2013
3  20   H     70 2013
4  13   H     75 2013

$`2014`
  Age Sex Weight year
1  16   F     70 2014
2  17   H     75 2014
3  22   F     70 2014
4  13   F     75 2014


</details>



# 答案3
**得分**: 3

以下是翻译好的内容:

> split(row.names&lt;-(do.call(rbind, mydatalist), NULL), ~year)
$2012
Age Sex Weight year
1 12 F 70 2012
2 13 H 75 2012
7 18 F 70 2012
8 19 H 75 2012

$2013
Age Sex Weight year
3 14 F 70 2013
4 15 H 75 2013
9 20 H 70 2013
10 13 H 75 2013

$2014
Age Sex Weight year
5 16 F 70 2014
6 17 H 75 2014
11 22 F 70 2014
12 13 F 75 2014



<details>
<summary>英文:</summary>

I am not sure how &quot;short&quot; you are happy with, but below might be an option

> split(row.names&lt;-(do.call(rbind, mydatalist), NULL), ~year)
$2012
Age Sex Weight year
1 12 F 70 2012
2 13 H 75 2012
7 18 F 70 2012
8 19 H 75 2012

$2013
Age Sex Weight year
3 14 F 70 2013
4 15 H 75 2013
9 20 H 70 2013
10 13 H 75 2013

$2014
Age Sex Weight year
5 16 F 70 2014
6 17 H 75 2014
11 22 F 70 2014
12 13 F 75 2014


</details>



huangapple
  • 本文由 发表于 2023年4月13日 20:06:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76005206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定