快速在R中按年份拆分数据框。

huangapple go评论94阅读模式
英文:

Quickly split a dataframe by year in R

问题

我有一个看起来像这样的面板

  1. country <- c("A","B","C","A","B","C","A","B","C")
  2. industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
  3. x2006<- sample(1000:100000,9)
  4. x2007<- sample(1000:100000,9)
  5. x2008<- sample(1000:100000,9)
  6. dat <- data.frame (country,industry,x2006,x2007,x2008)

我正在做一些非常简单的事情,比如

  1. dat2006 <- dat%>%
  2. select(country,industry,x2006)

然后使用 write.csv 将其保存为自己的文件

如果我想要重复这个操作,并为数据集中的每一年(即列)保存一个单独的文件,最好的方法是什么?

英文:

I have a panel that looks like this

  1. country <- c("A","B","C","A","B","C","A","B","C")
  2. industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
  3. x2006<- sample(1000:100000,9)
  4. x2007<- sample(1000:100000,9)
  5. x2008<- sample(1000:100000,9)
  6. dat <- data.frame (country,industry,x2006,x2007,x2008)

I am doing something very simple like

  1. dat2006 <- dat%>%
  2. select(country,industry,x2006)

then using write.csv to save it as its own file

What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?

答案1

得分: 2

你可以使用 sapply

  1. sapply(grep("x", names(dat)), function(y)
  2. write.csv(dat[, c(1, 2, y)],
  3. paste0(names(dat[y]), ".csv"),
  4. row.names = FALSE)
  5. )

grep会找到具有 x 的列,sapply会遍历这些列。这将以所选列名为文件名并将其保存在工作目录中。

请注意,您还可以以其他方式指定列。以下是一些替代方法:

  1. # 直接使用列位置(数字)
  2. sapply(3:5, function(y)
  3. write.csv(dat[, c(1, 2, y)],
  4. paste0(names(dat[y]), ".csv"),
  5. row.names = FALSE)
  6. )
  7. # 使用列名
  8. sapply(c("x2006", "x2007", "x2008"), function(y)
  9. write.csv(dat[, c("country", "industry", y)],
  10. paste0(names(dat[y]), ".csv"),
  11. row.names = FALSE))
英文:

You could use sapply:

  1. sapply(grep("x", names(dat)), function(y)
  2. write.csv(dat[, c(1, 2, y)],
  3. paste0(names(dat[y]), ".csv"),
  4. row.names = FALSE)
  5. )

grep finds the columns with x, sapply loops through them. This will name your csv file the column name selected and save it in the working directory.

Note, you could specify the columns in other ways too. A few alternatives:

  1. # using column locations (numbers) directly
  2. sapply(3:5, function(y)
  3. write.csv(dat[, c(1, 2, y)],
  4. paste0(names(dat[y]), ".csv"),
  5. row.names = FALSE)
  6. )
  7. # using column names
  8. sapply(c("x2006", "x2007", "x2008"), function(y)
  9. write.csv(dat[, c("country", "industry", y)],
  10. paste0(names(dat[y]), ".csv"),
  11. row.names = FALSE))

答案2

得分: 1

以下是一个使用 tidyverse 的解决方案:

  1. library(tidyverse)
  2. dat %>%
  3. pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
  4. split(.$year) %>%
  5. map(~ select(.x, country, industry, value)) %>%
  6. map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))
  1. $x2006
  2. # A tibble: 9 × 3
  3. country industry value
  4. <chr> <chr> <int>
  5. 1 A X 99954
  6. 2 B Y 27955
  7. 3 C Z 36009
  8. 4 A X 3061
  9. 5 B Y 25612
  10. 6 C Z 67307
  11. 7 A X 96514
  12. 8 B Y 97864
  13. 9 C Z 43014
  14. $x2007
  15. # A tibble: 9 × 3
  16. country industry value
  17. <chr> <chr> <int>
  18. 1 A X 83954
  19. 2 B Y 96141
  20. 3 C Z 62389
  21. 4 A X 28568
  22. 5 B Y 77503
  23. 6 C Z 70458
  24. 7 A X 34978
  25. 8 B Y 35408
  26. 9 C Z 68731
  27. $x2008
  28. # A tibble: 9 × 3
  29. country industry value
  30. <chr> <chr> <int>
  31. 1 A X 29498
  32. 2 B Y 62203
  33. 3 C Z 98125
  34. 4 A X 99549
  35. 5 B Y 56839
  36. 6 C Z 21621
  37. 7 A X 84214
  38. 8 B Y 85778
  39. 9 C Z 90275

注意:以上代码部分是 R 代码,不进行翻译。

英文:

Here is a tidyverse solution:

  1. library(tidyverse)
  2. dat %&gt;%
  3. pivot_longer(cols = starts_with(&quot;x&quot;), names_to = &quot;year&quot;, values_to = &quot;value&quot;) %&gt;%
  4. split(.$year) %&gt;%
  5. map(~ select(.x, country, industry, value)) %&gt;%
  6. map2(names(.), ~ write_csv(.x, file = paste0(&quot;dat_&quot;, .y, &quot;.csv&quot;)))
  1. $x2006
  2. # A tibble: 9 &#215; 3
  3. country industry value
  4. &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  5. 1 A X 99954
  6. 2 B Y 27955
  7. 3 C Z 36009
  8. 4 A X 3061
  9. 5 B Y 25612
  10. 6 C Z 67307
  11. 7 A X 96514
  12. 8 B Y 97864
  13. 9 C Z 43014
  14. $x2007
  15. # A tibble: 9 &#215; 3
  16. country industry value
  17. &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  18. 1 A X 83954
  19. 2 B Y 96141
  20. 3 C Z 62389
  21. 4 A X 28568
  22. 5 B Y 77503
  23. 6 C Z 70458
  24. 7 A X 34978
  25. 8 B Y 35408
  26. 9 C Z 68731
  27. $x2008
  28. # A tibble: 9 &#215; 3
  29. country industry value
  30. &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  31. 1 A X 29498
  32. 2 B Y 62203
  33. 3 C Z 98125
  34. 4 A X 99549
  35. 5 B Y 56839
  36. 6 C Z 21621
  37. 7 A X 84214
  38. 8 B Y 85778
  39. 9 C Z 90275

huangapple
  • 本文由 发表于 2023年6月19日 10:54:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定