
huangapple go评论80阅读模式

Quickly split a dataframe by year in R



country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)


dat2006 <- dat%>%

然后使用 write.csv 将其保存为自己的文件



I have a panel that looks like this

country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)  

I am doing something very simple like

dat2006 <- dat%>%

then using write.csv to save it as its own file

What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?


得分: 2

你可以使用 sapply

sapply(grep("x", names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE)

grep会找到具有 x 的列,sapply会遍历这些列。这将以所选列名为文件名并将其保存在工作目录中。


# 直接使用列位置(数字)
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), ".csv"),
            row.names = FALSE)

# 使用列名
sapply(c("x2006", "x2007", "x2008"), function(y)
       write.csv(dat[, c("country", "industry", y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE))

You could use sapply:

sapply(grep("x", names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE)

grep finds the columns with x, sapply loops through them. This will name your csv file the column name selected and save it in the working directory.

Note, you could specify the columns in other ways too. A few alternatives:

# using column locations (numbers) directly
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), ".csv"),
            row.names = FALSE)

# using column names
sapply(c("x2006", "x2007", "x2008"), function(y)
       write.csv(dat[, c("country", "industry", y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE))


得分: 1

以下是一个使用 tidyverse 的解决方案:


dat %>%
  pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
  split(.$year) %>%
  map(~ select(.x, country, industry, value)) %>%
  map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014

# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731

# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

注意:以上代码部分是 R 代码,不进行翻译。


Here is a tidyverse solution:


dat %&gt;%
  pivot_longer(cols = starts_with(&quot;x&quot;), names_to = &quot;year&quot;, values_to = &quot;value&quot;) %&gt;%
  split(.$year) %&gt;%
  map(~ select(.x, country, industry, value)) %&gt;%
  map2(names(.), ~ write_csv(.x, file = paste0(&quot;dat_&quot;, .y, &quot;.csv&quot;)))
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014

# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731

# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

  • 本文由 发表于 2023年6月19日 10:54:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503355.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
