2023年5月17日 10:38:01go评论92阅读模式

英文:

Subset a List of Files between Two Strings in R

问题

我想要提取每个文件，介于两个日期之间（由"Nx."后面的数字指定）。

例如，我想要提取1995年1月20日到1995年1月23日之间的子集。我想要提取1995年1月22日到1995年1月25日之间的另一个子集，依此类推。我将对这些"mini-datasets"中的每一个进行单独的分析。

我尝试过使用"stringr"包，但没有成功。我最接近的尝试是使用str_subset(x, "[a-u]")表达式（将文件名替换为"a"和"u"），但没有成功。

英文:

I have a list of files in a similar pattern to below:

filenames &lt;- c(&quot;MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950124.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950125.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950126.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950127.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950128.SUB.nc&quot;,
               &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950129.SUB.nc&quot;)

I would like to extract each file between two dates (specified by the numbers after the "Nx.".

For example, I would like a subset of January 20th, 1995 - January 23rd, 1995. I would like another subset of January 22nd, 1995 - January 25th, 1995, and so on. I will be conducting individual analyses on each one of these "mini-datasets".

I have tried to work with the "stringr" package, but no luck. The closest I've gotten was using the str_subset(x, "[a-u]") expression (except with the filenames in place of the "a" and "u"), but no luck.

答案1

得分: 0

你可以执行以下操作：

定义你要提取的日期范围
从文件名中提取日期信息
根据你定义的日期范围进行筛选

载入 tidyverse 库
date_start &lt;- 转换为日期('01-20-1995', 格式 = '月-日-年')
date_end &lt;- 转换为日期('01-23-1995', 格式 = '月-日-年')
匹配日期 &lt;- 以'day'为间隔生成日期序列(date_start 至 date_end)
从 filenames 中提取子集并转换为数据框，然后进行操作：
  提取日期信息并存入 date 字段，
         将 date 字段转换为日期格式'年月日'
  筛选出与匹配日期相符的记录
  提取文件名
[1] "MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc"
[2] "MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc"
[3] "MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc"
[4] "MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc"

或者，你可以执行以下操作以获得相同结果：

date_start &lt;- 转换为日期('19950120', 格式 = '年月日')
date_end &lt;- 转换为日期('19950123', 格式 = '年月日')
匹配日期 &lt;- 以'day'为间隔生成日期序列(date_start 至 date_end)，并转换为'年月日'格式
使用 stringr 库的 str_subset 函数，筛选 filenames 中匹配 match_dates 的子集

英文:

You can do the following, which is:

Define the range for your dates you want to extract
get the date information out of your filenames
Filter on those dates defined by your range

library(tidyverse)
date_start &lt;- as.Date(&#39;01-20-1995&#39;, format = &#39;%m-%d-%Y&#39;)
date_end &lt;- as.Date(&#39;01-23-1995&#39;, format = &#39;%m-%d-%Y&#39;)
match_dates &lt;- seq(date_start, date_end, by = &#39;day&#39;)
filenames_subset &lt;- as.data.frame(filenames) %&gt;%
  mutate(date = str_match(filenames, &#39;Nx.(.*?).SUB&#39;)[,2],
         date = as.Date(date, format = &#39;%Y%m%d&#39;)) %&gt;%
  filter(date %in% !!match_dates) %&gt;%
  pull(filenames)
[1] &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950120.SUB.nc&quot;
[2] &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950121.SUB.nc&quot;
[3] &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950122.SUB.nc&quot;
[4] &quot;MERRA2_200.tavg1_2d_lnd_Nx.19950123.SUB.nc&quot;

Alternatively, you can do the following which yields the same result:

date_start &lt;- as.Date(&#39;19950120&#39;, format = &#39;%Y%m%d&#39;)
date_end &lt;- as.Date(&#39;19950123&#39;, format = &#39;%Y%m%d&#39;)
match_dates &lt;- format(seq(date_start, date_end, by = &#39;day&#39;), &#39;%Y%m%d&#39;)
stringr::str_subset(filenames, paste0(match_dates, collapse = &#39;|&#39;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中在两个字符串之间对文件列表进行子集化。

问题

答案1

如何将整数转换为字符串并仍然计数？

如何提取每家医院中重叠的住院期间？

计算组内所有可能点之间的距离的方法。

将括号中的数字分开到不同的列中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。