英文:
Split large file in R into smaller files with a loop
问题
我有一个包含12,626,756行的CSV文件,我需要将其拆分成较小的文件,以便同事可以在Excel中打开。我想创建一个循环,将文件拆分为适合Excel行限制的文件,并将它们导出为CSV文件,直到达到文件末尾(应生成13个文件)。
#步骤1:加载数据
data <- read.csv(".../Desktop/Data/file.csv", header = TRUE)
#步骤2:计算行数
totalrows <- nrow(data)
#步骤3:确定需要多少个拆分文件
excelrowlimit <- 1048576 - 5
filesrequired <- ceiling(totalrows / excelrowlimit)
例如:
csvfile 1应包含行1:1048571
csvfile 2应包含行1048572:2097143
csvfile 3应包含行2097144:3145715
csvfile 4应包含行3145716:4194287
...等等
如何编写一个循环语句,以(1)按所需的文件数进行拆分,(2)为每个CSV导出提供不同的文件名?
英文:
I have a csv file with 12,626,756 rows that I need to split into smaller files so a colleague can open them in Excel. I want to create a loop that splits the file into files that fit within Excel's row limit and exports them as CSV files until it reaches the end (it should produce 13 files)
#STEP 1: load data
data <- read.csv(".../Desktop/Data/file.csv", header = TRUE)
#STEP 2: count rows
totalrows <- nrow(data)
#STEP 3: determine how many splits you need
excelrowlimit <- 1048576 - 5
filesrequired <- ceiling(totalrows/ excelrowlimit)
for example:
csvfile 1 should contain rows 1:1048571
csvfile 2 should contain rows 1048572:2097143
csvfile 3 should contain rows 2097144:3145715
csvfile 4 should contain rows 3145716:4194287
... and so on
how can I write a loop statement that (1) splits by number of files needed and (2) gives a different file name to each csv export?
答案1
得分: 1
这是扩展我上面评论的解决方案。与其他任何解决方案相比,这应该具有更小的内存需求,因为它不需要复制原始数据帧的全部或部分。
library(tidyverse)
rowCount <- 1048571
data %>%
mutate(Group = ceiling((row_number()) / rowCount)) %>%
group_by(Group) %>%
group_walk(
function(.x, .y) {
write.csv(.x, file = paste0("file", .y$Group, ".csv"))
}
)
英文:
Here's a solution expanding my comment above. This should have a smaller memory requirement than any other solution as it does not require copying all or part of the original data frame.
library(tidyverse)
rowCount <- 1048571
data %>%
mutate(Group = ceiling((row_number()) / rowCount)) %>%
group_by(Group) %>%
group_walk(
function(.x, .y) {
write.csv(.x, file = paste0("file", .y$Group, ".csv"))
}
)
答案2
得分: 0
这里是一个示例,演示如何使用 split_at
来设置所需的文件大小。
在最后部分,你当然可以根据需要更改 write_csv
的参数,例如设置路径、分隔符等。
library(tidyverse)
split_at <- 5
data.frame(x = 1:19) %>%
mutate(group = (row_number() - 1) %/% !! split_at) %>%
group_split(group) %>%
map(.f = ~write_csv(.x, file = paste0('file ', unique(.x$group), '.csv')))
英文:
Here‘s an example of how to achieve this where you can set the desired file size with split_at
.
In the last part, you can of course change the write_csv arguments as you want, e.g. to set a path, a delimiter etc.
library(tidyverse)
split_at <- 5
data.frame(x = 1:19) %>%
mutate(group = (row_number() - 1) %/% !! split_at) %>%
group_split(group) %>%
map(.f = ~write_csv(.x, file = paste0('file ', unique(.x$group), '.csv')))
答案3
得分: 0
#STEP 1: 加载数据
data <- read.csv(".../Desktop/Data/file.csv", header = TRUE)
对数据进行分组,每500行一个分组
data <- data %>% mutate(Group = ceiling(1:nrow(.)/500))
按照分组写出CSV文件
for(i in unique(data$Group)){
data %>% filter(Group == i) %>% select(-Group) %>%
write.csv(paste0("/your/path/",i,".csv"))
}
英文:
I assume that split data by every 500 rows.You can mutate a column to lable group.Then put in for loop to write out csv according to this column.
#STEP 1: load data
data <- read.csv(".../Desktop/Data/file.csv", header = TRUE)
# mutate a column to lable the group
data <- data %>% mutate(Group = ceiling(1:nrow(.)/500))
# write out csv by group
for(i in unique(data$Group)){
data %>% filter(Group == i) %>% select(-Group) %>%
write.csv(paste0("/your/path/",i,".csv"))
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论