英文:
Quickly split a dataframe by year in R
问题
我有一个看起来像这样的面板
country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)
我正在做一些非常简单的事情,比如
dat2006 <- dat%>%
select(country,industry,x2006)
然后使用 write.csv
将其保存为自己的文件
如果我想要重复这个操作,并为数据集中的每一年(即列)保存一个单独的文件,最好的方法是什么?
英文:
I have a panel that looks like this
country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)
I am doing something very simple like
dat2006 <- dat%>%
select(country,industry,x2006)
then using write.csv
to save it as its own file
What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?
答案1
得分: 2
你可以使用 sapply
:
sapply(grep("x", names(dat)), function(y)
write.csv(dat[, c(1, 2, y)],
paste0(names(dat[y]), ".csv"),
row.names = FALSE)
)
grep
会找到具有 x
的列,sapply
会遍历这些列。这将以所选列名为文件名并将其保存在工作目录中。
请注意,您还可以以其他方式指定列。以下是一些替代方法:
# 直接使用列位置(数字)
sapply(3:5, function(y)
write.csv(dat[, c(1, 2, y)],
paste0(names(dat[y]), ".csv"),
row.names = FALSE)
)
# 使用列名
sapply(c("x2006", "x2007", "x2008"), function(y)
write.csv(dat[, c("country", "industry", y)],
paste0(names(dat[y]), ".csv"),
row.names = FALSE))
英文:
You could use sapply
:
sapply(grep("x", names(dat)), function(y)
write.csv(dat[, c(1, 2, y)],
paste0(names(dat[y]), ".csv"),
row.names = FALSE)
)
grep
finds the columns with x
, sapply
loops through them. This will name your csv file the column name selected and save it in the working directory.
Note, you could specify the columns in other ways too. A few alternatives:
# using column locations (numbers) directly
sapply(3:5, function(y)
write.csv(dat[, c(1, 2, y)],
paste0(names(dat[y]), ".csv"),
row.names = FALSE)
)
# using column names
sapply(c("x2006", "x2007", "x2008"), function(y)
write.csv(dat[, c("country", "industry", y)],
paste0(names(dat[y]), ".csv"),
row.names = FALSE))
答案2
得分: 1
以下是一个使用 tidyverse
的解决方案:
library(tidyverse)
dat %>%
pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
split(.$year) %>%
map(~ select(.x, country, industry, value)) %>%
map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))
$x2006
# A tibble: 9 × 3
country industry value
<chr> <chr> <int>
1 A X 99954
2 B Y 27955
3 C Z 36009
4 A X 3061
5 B Y 25612
6 C Z 67307
7 A X 96514
8 B Y 97864
9 C Z 43014
$x2007
# A tibble: 9 × 3
country industry value
<chr> <chr> <int>
1 A X 83954
2 B Y 96141
3 C Z 62389
4 A X 28568
5 B Y 77503
6 C Z 70458
7 A X 34978
8 B Y 35408
9 C Z 68731
$x2008
# A tibble: 9 × 3
country industry value
<chr> <chr> <int>
1 A X 29498
2 B Y 62203
3 C Z 98125
4 A X 99549
5 B Y 56839
6 C Z 21621
7 A X 84214
8 B Y 85778
9 C Z 90275
注意:以上代码部分是 R 代码,不进行翻译。
英文:
Here is a tidyverse
solution:
library(tidyverse)
dat %>%
pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
split(.$year) %>%
map(~ select(.x, country, industry, value)) %>%
map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))
$x2006
# A tibble: 9 × 3
country industry value
<chr> <chr> <int>
1 A X 99954
2 B Y 27955
3 C Z 36009
4 A X 3061
5 B Y 25612
6 C Z 67307
7 A X 96514
8 B Y 97864
9 C Z 43014
$x2007
# A tibble: 9 × 3
country industry value
<chr> <chr> <int>
1 A X 83954
2 B Y 96141
3 C Z 62389
4 A X 28568
5 B Y 77503
6 C Z 70458
7 A X 34978
8 B Y 35408
9 C Z 68731
$x2008
# A tibble: 9 × 3
country industry value
<chr> <chr> <int>
1 A X 29498
2 B Y 62203
3 C Z 98125
4 A X 99549
5 B Y 56839
6 C Z 21621
7 A X 84214
8 B Y 85778
9 C Z 90275
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论