我正在寻找一个更短的函数来从列表中分组相似的数据集。

huangapple go评论90阅读模式
英文:

I am looking for a shorter function to group similar datasets from a list

问题

我有一个分散的数据库列表。我想要将相同年份的数据库分组;

  1. mydata12<-data.frame(Age=c(12,13),Sex=c("F","H"), Weight=c(70,75),year=c(2012))
  2. mydata13<-data.frame(Age=c(14,15),Sex=c("F","H"), Weight=c(70,75),year=c(2013))
  3. mydata14<-data.frame(Age=c(16,17),Sex=c("F","H"), Weight=c(70,75),year=c(2014))
  4. mydata2012<-data.frame(Age=c(18,19),Sex=c("F","H"), Weight=c(70,75),year=c(2012))
  5. mydata2013<-data.frame(Age=c(20,13),Sex=c("H","H"), Weight=c(70,75),year=c(2013))
  6. mydata2014<-data.frame(Age=c(22,13),Sex=c("F","F"), Weight=c(70,75),year=c(2014))
  7. mydatalist<-list(
  8. `12`=mydata12,
  9. `13`=mydata13,
  10. `14`=mydata14,
  11. `2013`=mydata2012,
  12. `2014`=mydata2013,
  13. `2015`=mydata2014
  14. )

你可以使用以下函数来完成:

  1. list(`2012`=rbind(mydatalist$`12`,mydata2012),
  2. `2013`=rbind(mydatalist$`13`,mydata2013),
  3. `2014`=rbind(mydatalist$`14`,mydata2014))

但我想要让它更简洁(不需要为每一年编写一行代码),因为我们已经有模式 2012:2021,可以这样做:

  1. 12:21
英文:

I have scattered databases in a list. I would like to group databases of the same year;

  1. mydata12&lt;-data.frame(Age=c(12,13),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2012))
  2. mydata13&lt;-data.frame(Age=c(14,15),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2013))
  3. mydata14&lt;-data.frame(Age=c(16,17),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2014))
  4. mydata2012&lt;-data.frame(Age=c(18,19),Sex=c(&quot;F&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2012))
  5. mydata2013&lt;-data.frame(Age=c(20,13),Sex=c(&quot;H&quot;,&quot;H&quot;), Weight=c(70,75),year=c(2013))
  6. mydata2014&lt;-data.frame(Age=c(22,13),Sex=c(&quot;F&quot;,&quot;F&quot;), Weight=c(70,75),year=c(2014))
  7. mydatalist&lt;-list(
  8. `12`=mydata12,
  9. `13`=mydata13,
  10. `14`=mydata14,
  11. `2013`=mydata2012,
  12. `2014`=mydata2013,
  13. `2015`=mydata2014
  14. )

I can do it with this function

  1. list(`2012`=rbind(mydatalist$`12`,mydata2012),
  2. `2013`=rbind(mydatalist$`13`,mydata2013),
  3. `2014`=rbind(mydatalist$`14`,mydata2014))

but I would like to make it shorter (without a line of code for each year), since we already have patterns 2012:2021,

  1. 12:21

答案1

得分: 3

We could use bind_rows with group_split:

  1. library(dplyr)
  2. bind_rows(mydatalist) %>%
  3. split(f = as.factor(.$year))
  1. $`2012`
  2. Age Sex Weight year
  3. 1 12 F 70 2012
  4. 2 13 H 75 2012
  5. 7 18 F 70 2012
  6. 8 19 H 75 2012
  7. $`2013`
  8. Age Sex Weight year
  9. 3 14 F 70 2013
  10. 4 15 H 75 2013
  11. 9 20 H 70 2013
  12. 10 13 H 75 2013
  13. $`2014`
  14. Age Sex Weight year
  15. 5 16 F 70 2014
  16. 6 17 H 75 2014
  17. 11 22 F 70 2014
  18. 12 13 F 75 2014
英文:

We could use bind_rows with group_split:

  1. library(dplyr)
  2. bind_rows(mydatalist) %&gt;%
  3. split(f = as.factor(.$year))
  1. $`2012`
  2. Age Sex Weight year
  3. 1 12 F 70 2012
  4. 2 13 H 75 2012
  5. 7 18 F 70 2012
  6. 8 19 H 75 2012
  7. $`2013`
  8. Age Sex Weight year
  9. 3 14 F 70 2013
  10. 4 15 H 75 2013
  11. 9 20 H 70 2013
  12. 10 13 H 75 2013
  13. $`2014`
  14. Age Sex Weight year
  15. 5 16 F 70 2014
  16. 6 17 H 75 2014
  17. 11 22 F 70 2014
  18. 12 13 F 75 2014

答案2

得分: 3

In base R, 使用以下方式之一来处理列表名称的子串,即最后两位数字,或者在列表名称仅包含2个字符的情况下添加20作为前缀,然后执行splitrbind

  1. out <- lapply(split(mydatalist, sub("^(\\d{2})$", "20\", names(mydatalist))), \(x) `row.names<-`(do.call(rbind, x), NULL))

-output

  1. > out
  2. $`2012`
  3. Age Sex Weight year
  4. 1 12 F 70 2012
  5. 2 13 H 75 2012
  6. 3 18 F 70 2012
  7. 4 19 H 75 2012
  8. $`2013`
  9. Age Sex Weight year
  10. 1 14 F 70 2013
  11. 2 15 H 75 2013
  12. 3 20 H 70 2013
  13. 4 13 H 75 2013
  14. $`2014`
  15. Age Sex Weight year
  16. 1 16 F 70 2014
  17. 2 17 H 75 2014
  18. 3 22 F 70 2014
  19. 4 13 F 75 2014
英文:

In base R, use either the substring of the list names i.e. the last 2 digits or add 20 as prefix to those have only 2 characters in the list name, then split the list and rbind

  1. out &lt;- lapply(split(mydatalist, sub(&quot;^(\\d{2})$&quot;, &quot;20\&quot;,
  2. names(mydatalist))), \(x) `row.names&lt;-`(do.call(rbind, x), NULL))

-output

  1. &gt; out
  2. $`2012`
  3. Age Sex Weight year
  4. 1 12 F 70 2012
  5. 2 13 H 75 2012
  6. 3 18 F 70 2012
  7. 4 19 H 75 2012
  8. $`2013`
  9. Age Sex Weight year
  10. 1 14 F 70 2013
  11. 2 15 H 75 2013
  12. 3 20 H 70 2013
  13. 4 13 H 75 2013
  14. $`2014`
  15. Age Sex Weight year
  16. 1 16 F 70 2014
  17. 2 17 H 75 2014
  18. 3 22 F 70 2014
  19. 4 13 F 75 2014
  20. </details>
  21. # 答案3
  22. **得分**: 3
  23. 以下是翻译好的内容:

> split(row.names&lt;-(do.call(rbind, mydatalist), NULL), ~year)
$2012
Age Sex Weight year
1 12 F 70 2012
2 13 H 75 2012
7 18 F 70 2012
8 19 H 75 2012

$2013
Age Sex Weight year
3 14 F 70 2013
4 15 H 75 2013
9 20 H 70 2013
10 13 H 75 2013

$2014
Age Sex Weight year
5 16 F 70 2014
6 17 H 75 2014
11 22 F 70 2014
12 13 F 75 2014

  1. <details>
  2. <summary>英文:</summary>
  3. I am not sure how &quot;short&quot; you are happy with, but below might be an option

> split(row.names&lt;-(do.call(rbind, mydatalist), NULL), ~year)
$2012
Age Sex Weight year
1 12 F 70 2012
2 13 H 75 2012
7 18 F 70 2012
8 19 H 75 2012

$2013
Age Sex Weight year
3 14 F 70 2013
4 15 H 75 2013
9 20 H 70 2013
10 13 H 75 2013

$2014
Age Sex Weight year
5 16 F 70 2014
6 17 H 75 2014
11 22 F 70 2014
12 13 F 75 2014

  1. </details>

huangapple
  • 本文由 发表于 2023年4月13日 20:06:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76005206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定