在R中在两个字符串之间对文件列表进行子集化。

huangapple go评论52阅读模式
英文:

Subset a List of Files between Two Strings in R

问题

我想要提取每个文件,介于两个日期之间(由"Nx."后面的数字指定)。

例如,我想要提取1995年1月20日到1995年1月23日之间的子集。我想要提取1995年1月22日到1995年1月25日之间的另一个子集,依此类推。我将对这些"mini-datasets"中的每一个进行单独的分析。

我尝试过使用"stringr"包,但没有成功。我最接近的尝试是使用str_subset(x, "[a-u]")表达式(将文件名替换为"a"和"u"),但没有成功。

英文:

I have a list of files in a similar pattern to below:

filenames <- c("MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950124.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950125.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950126.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950127.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950128.SUB.nc",
               "MERRA2_200.tavg1_2d_lnd_Nx.19950129.SUB.nc")

I would like to extract each file between two dates (specified by the numbers after the "Nx.".

For example, I would like a subset of January 20th, 1995 - January 23rd, 1995. I would like another subset of January 22nd, 1995 - January 25th, 1995, and so on. I will be conducting individual analyses on each one of these "mini-datasets".

I have tried to work with the "stringr" package, but no luck. The closest I've gotten was using the str_subset(x, "[a-u]") expression (except with the filenames in place of the "a" and "u"), but no luck.

答案1

得分: 0

你可以执行以下操作:

  • 定义你要提取的日期范围
  • 从文件名中提取日期信息
  • 根据你定义的日期范围进行筛选

载入 tidyverse 库
date_start <- 转换为日期('01-20-1995', 格式 = '月-日-年')

date_end <- 转换为日期('01-23-1995', 格式 = '月-日-年')

匹配日期 <- 以'day'为间隔生成日期序列(date_start 至 date_end)

从 filenames 中提取子集并转换为数据框,然后进行操作:
  提取日期信息并存入 date 字段,
         将 date 字段转换为日期格式'年月日'
  筛选出与匹配日期相符的记录
  提取文件名

[1] "MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc"
[2] "MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc"
[3] "MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc"
[4] "MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc"

或者,你可以执行以下操作以获得相同结果:

date_start <- 转换为日期('19950120', 格式 = '年月日')

date_end <- 转换为日期('19950123', 格式 = '年月日')

匹配日期 <- 以'day'为间隔生成日期序列(date_start 至 date_end),并转换为'年月日'格式

使用 stringr 库的 str_subset 函数,筛选 filenames 中匹配 match_dates 的子集
英文:

You can do the following, which is:

  • Define the range for your dates you want to extract
  • get the date information out of your filenames
  • Filter on those dates defined by your range

library(tidyverse)
date_start <- as.Date('01-20-1995', format = '%m-%d-%Y')

date_end <- as.Date('01-23-1995', format = '%m-%d-%Y')

match_dates <- seq(date_start, date_end, by = 'day')

filenames_subset <- as.data.frame(filenames) %>%
  mutate(date = str_match(filenames, 'Nx.(.*?).SUB')[,2],
         date = as.Date(date, format = '%Y%m%d')) %>%
  filter(date %in% !!match_dates) %>%
  pull(filenames)

[1] "MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc"
[2] "MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc"
[3] "MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc"
[4] "MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc"

Alternatively, you can do the following which yields the same result:

date_start <- as.Date('19950120', format = '%Y%m%d')

date_end <- as.Date('19950123', format = '%Y%m%d')

match_dates <- format(seq(date_start, date_end, by = 'day'), '%Y%m%d')

stringr::str_subset(filenames, paste0(match_dates, collapse = '|'))

huangapple
  • 本文由 发表于 2023年5月17日 10:38:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76268232.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定