2023年6月19日 10:54:12go评论94阅读模式

英文:

Quickly split a dataframe by year in R

问题

我有一个看起来像这样的面板

country &lt;- c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;,&quot;C&quot;)
industry&lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;Z&quot;,&quot;X&quot;,&quot;Y&quot;,&quot;Z&quot;,&quot;X&quot;,&quot;Y&quot;,&quot;Z&quot;)
x2006&lt;- sample(1000:100000,9)
x2007&lt;- sample(1000:100000,9)
x2008&lt;- sample(1000:100000,9)
dat &lt;- data.frame (country,industry,x2006,x2007,x2008)

我正在做一些非常简单的事情，比如

dat2006 &lt;- dat%&gt;%
    select(country,industry,x2006)

然后使用 write.csv 将其保存为自己的文件

如果我想要重复这个操作，并为数据集中的每一年（即列）保存一个单独的文件，最好的方法是什么？

英文:

I have a panel that looks like this

country &lt;- c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;,&quot;C&quot;)
industry&lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;Z&quot;,&quot;X&quot;,&quot;Y&quot;,&quot;Z&quot;,&quot;X&quot;,&quot;Y&quot;,&quot;Z&quot;)
x2006&lt;- sample(1000:100000,9)
x2007&lt;- sample(1000:100000,9)
x2008&lt;- sample(1000:100000,9)
dat &lt;- data.frame (country,industry,x2006,x2007,x2008)

I am doing something very simple like

dat2006 &lt;- dat%&gt;%
    select(country,industry,x2006)

then using write.csv to save it as its own file

What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?

答案1

得分: 2

你可以使用 sapply：

sapply(grep("x", names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE)
)

grep会找到具有 x 的列，sapply会遍历这些列。这将以所选列名为文件名并将其保存在工作目录中。

请注意，您还可以以其他方式指定列。以下是一些替代方法：

# 直接使用列位置（数字）
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), ".csv"),
            row.names = FALSE)
)
# 使用列名
sapply(c("x2006", "x2007", "x2008"), function(y)
       write.csv(dat[, c("country", "industry", y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE))

英文:

You could use sapply:

sapply(grep(&quot;x&quot;, names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), &quot;.csv&quot;),
                 row.names = FALSE)
)

grep finds the columns with x, sapply loops through them. This will name your csv file the column name selected and save it in the working directory.

Note, you could specify the columns in other ways too. A few alternatives:

# using column locations (numbers) directly
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), &quot;.csv&quot;),
            row.names = FALSE)
)
# using column names
sapply(c(&quot;x2006&quot;, &quot;x2007&quot;, &quot;x2008&quot;), function(y)
       write.csv(dat[, c(&quot;country&quot;, &quot;industry&quot;, y)], 
                 paste0(names(dat[y]), &quot;.csv&quot;),
                 row.names = FALSE))

答案2

得分: 1

以下是一个使用 tidyverse 的解决方案：

library(tidyverse)
dat %>%
  pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
  split(.$year) %>%
  map(~ select(.x, country, industry, value)) %>%
  map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))

$x2006
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014
$x2007
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731
$x2008
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

注意：以上代码部分是 R 代码，不进行翻译。

英文:

Here is a tidyverse solution:

library(tidyverse)
dat %&gt;%
  pivot_longer(cols = starts_with(&quot;x&quot;), names_to = &quot;year&quot;, values_to = &quot;value&quot;) %&gt;%
  split(.$year) %&gt;%
  map(~ select(.x, country, industry, value)) %&gt;%
  map2(names(.), ~ write_csv(.x, file = paste0(&quot;dat_&quot;, .y, &quot;.csv&quot;)))

$x2006
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014
$x2007
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731
$x2008
# A tibble: 9 &#215; 3
  country industry value
  &lt;chr&gt;   &lt;chr&gt;    &lt;int&gt;
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

快速在R中按年份拆分数据框。

问题

答案1

答案2

如何在替换ggplot2中的比例尺时抑制“填充比例已存在”警告？

echarts4r 中在 tooltip 中使用 valueFormatter。

facet_wrap在使用geom_density_ridges时未显示第二个facet。

如何在R中基于最新的观测和日期行筛选并创建新的数据库？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。