快速在R中按年份拆分数据框。

huangapple go评论54阅读模式
英文:

Quickly split a dataframe by year in R

问题

我有一个看起来像这样的面板

country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)

我正在做一些非常简单的事情,比如

dat2006 <- dat%>%
    select(country,industry,x2006)

然后使用 write.csv 将其保存为自己的文件

如果我想要重复这个操作,并为数据集中的每一年(即列)保存一个单独的文件,最好的方法是什么?

英文:

I have a panel that looks like this

country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)  

I am doing something very simple like

dat2006 <- dat%>%
    select(country,industry,x2006)

then using write.csv to save it as its own file

What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?

答案1

得分: 2

你可以使用 sapply

sapply(grep("x", names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE)
)

grep会找到具有 x 的列,sapply会遍历这些列。这将以所选列名为文件名并将其保存在工作目录中。

请注意,您还可以以其他方式指定列。以下是一些替代方法:

# 直接使用列位置(数字)
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), ".csv"),
            row.names = FALSE)
)

# 使用列名
sapply(c("x2006", "x2007", "x2008"), function(y)
       write.csv(dat[, c("country", "industry", y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE))
英文:

You could use sapply:

sapply(grep("x", names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE)
)

grep finds the columns with x, sapply loops through them. This will name your csv file the column name selected and save it in the working directory.

Note, you could specify the columns in other ways too. A few alternatives:

# using column locations (numbers) directly
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), ".csv"),
            row.names = FALSE)
)

# using column names
sapply(c("x2006", "x2007", "x2008"), function(y)
       write.csv(dat[, c("country", "industry", y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE))

答案2

得分: 1

以下是一个使用 tidyverse 的解决方案:

library(tidyverse)

dat %>%
  pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
  split(.$year) %>%
  map(~ select(.x, country, industry, value)) %>%
  map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))
$x2006
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014

$x2007
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731

$x2008
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

注意:以上代码部分是 R 代码,不进行翻译。

英文:

Here is a tidyverse solution:

library(tidyverse)

dat %&gt;%
  pivot_longer(cols = starts_with(&quot;x&quot;), names_to = &quot;year&quot;, values_to = &quot;value&quot;) %&gt;%
  split(.$year) %&gt;%
  map(~ select(.x, country, industry, value)) %&gt;%
  map2(names(.), ~ write_csv(.x, file = paste0(&quot;dat_&quot;, .y, &quot;.csv&quot;)))
$x2006
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014

$x2007
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731

$x2008
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

huangapple
  • 本文由 发表于 2023年6月19日 10:54:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定