英文:
Subset a List of Files between Two Strings in R
问题
我想要提取每个文件,介于两个日期之间(由"Nx."后面的数字指定)。
例如,我想要提取1995年1月20日到1995年1月23日之间的子集。我想要提取1995年1月22日到1995年1月25日之间的另一个子集,依此类推。我将对这些"mini-datasets"中的每一个进行单独的分析。
我尝试过使用"stringr"包,但没有成功。我最接近的尝试是使用str_subset(x, "[a-u]")
表达式(将文件名替换为"a"和"u"),但没有成功。
英文:
I have a list of files in a similar pattern to below:
filenames <- c("MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950124.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950125.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950126.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950127.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950128.SUB.nc",
"MERRA2_200.tavg1_2d_lnd_Nx.19950129.SUB.nc")
I would like to extract each file between two dates (specified by the numbers after the "Nx.".
For example, I would like a subset of January 20th, 1995 - January 23rd, 1995. I would like another subset of January 22nd, 1995 - January 25th, 1995, and so on. I will be conducting individual analyses on each one of these "mini-datasets".
I have tried to work with the "stringr" package, but no luck. The closest I've gotten was using the str_subset(x, "[a-u]")
expression (except with the filenames in place of the "a" and "u"), but no luck.
答案1
得分: 0
你可以执行以下操作:
- 定义你要提取的日期范围
- 从文件名中提取日期信息
- 根据你定义的日期范围进行筛选
载入 tidyverse 库
date_start <- 转换为日期('01-20-1995', 格式 = '月-日-年')
date_end <- 转换为日期('01-23-1995', 格式 = '月-日-年')
匹配日期 <- 以'day'为间隔生成日期序列(date_start 至 date_end)
从 filenames 中提取子集并转换为数据框,然后进行操作:
提取日期信息并存入 date 字段,
将 date 字段转换为日期格式'年月日'
筛选出与匹配日期相符的记录
提取文件名
[1] "MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc"
[2] "MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc"
[3] "MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc"
[4] "MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc"
或者,你可以执行以下操作以获得相同结果:
date_start <- 转换为日期('19950120', 格式 = '年月日')
date_end <- 转换为日期('19950123', 格式 = '年月日')
匹配日期 <- 以'day'为间隔生成日期序列(date_start 至 date_end),并转换为'年月日'格式
使用 stringr 库的 str_subset 函数,筛选 filenames 中匹配 match_dates 的子集
英文:
You can do the following, which is:
- Define the range for your dates you want to extract
- get the date information out of your filenames
- Filter on those dates defined by your range
library(tidyverse)
date_start <- as.Date('01-20-1995', format = '%m-%d-%Y')
date_end <- as.Date('01-23-1995', format = '%m-%d-%Y')
match_dates <- seq(date_start, date_end, by = 'day')
filenames_subset <- as.data.frame(filenames) %>%
mutate(date = str_match(filenames, 'Nx.(.*?).SUB')[,2],
date = as.Date(date, format = '%Y%m%d')) %>%
filter(date %in% !!match_dates) %>%
pull(filenames)
[1] "MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc"
[2] "MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc"
[3] "MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc"
[4] "MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc"
Alternatively, you can do the following which yields the same result:
date_start <- as.Date('19950120', format = '%Y%m%d')
date_end <- as.Date('19950123', format = '%Y%m%d')
match_dates <- format(seq(date_start, date_end, by = 'day'), '%Y%m%d')
stringr::str_subset(filenames, paste0(match_dates, collapse = '|'))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论