2020年1月7日 01:39:15go评论94阅读模式

英文:

Extract dates in various formats from string in R

问题

我需要快速从字符向量中提取日期。
我有两个主要问题：

各种日期格式（欧洲和美国，字母和数字...）
每个向量中有多个日期。

我的向量如下所示：

c(&quot;11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00     &quot;)

我尝试过使用parse_date和strptime但没有成功。
我不了解正则表达式语法，也没有时间深入研究。

非常感谢你的帮助。

英文:

I need to quickly extract dates from character vectors.
I have 2 main issues:

Various date formats (European and American, alphanumeric and numeric...)
Multiple dates in each vector.

My vectors are something as follows:

c(&quot;11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00     &quot;)

I have tried using parse_date and strptime without success.
I do not know anything about the regex syntax and do not really have time to dig into it.

Warmly thank you for your help.

答案1

得分: 2

We can use str_extract_all to extract all the dates with a pattern of two digits followed by /, followed by two digits, / and then four digits

library(stringr)
str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]]

###data

v1 <- c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 ")

英文:

We can use str_extract_all to extract all the dates with a pattern of two digits followed by /, followed by two digits, / and then four digits

library(stringr)
str_extract_all(v1, &quot;\\d{2}/\\d{2}/\\d{4}&quot;)[[1]]

###data

v1 &lt;-  c(&quot;11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 &quot;)

答案2

得分: 2

如果你需要R语言中的日期，你需要选择更看重美国日期格式还是欧洲日期格式。

library(tidyverse)
library(lubridate)

v1 <-  c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00")

str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]] %>%
  tibble(value = .) %>%
  mutate(american_date = value %>% mdy,
         european_date = value %>% dmy,
         stronger_american = coalesce(american_date,european_date),
         stronger_european = coalesce(european_date,american_date))

警告：有3个日期无法解析。

以下是代码的输出结果：

# A tibble: 4 x 5
  value      american_date european_date stronger_american stronger_european
  <chr>      <date>        <date>        <date>            <date>           
1 11/09/2016 2016-11-09    2016-09-11    2016-11-09        2016-09-11       
2 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28       
3 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28       
4 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28

创建日期：2020-01-06，使用了reprex包 (v0.3.0)。

英文:

If you need R dates, you will need to choose if you value more American or European dates

library(tidyverse)
library(lubridate)
#&gt; 
#&gt; Attaching package: &#39;lubridate&#39;
#&gt; The following object is masked from &#39;package:base&#39;:
#&gt; 
#&gt;     date


v1 &lt;-  c(&quot;11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00&quot;)

str_extract_all(v1, &quot;\\d{2}/\\d{2}/\\d{4}&quot;)[[1]] %&gt;% 
  tibble(value = .) %&gt;% 
  mutate(american_date = value %&gt;% mdy,
         european_date = value %&gt;% dmy,
         stronger_american = coalesce(american_date,european_date),
         stronger_european = coalesce(european_date,american_date))
#&gt; Warning: 3 failed to parse.
#&gt; # A tibble: 4 x 5
#&gt;   value      american_date european_date stronger_american stronger_european
#&gt;   &lt;chr&gt;      &lt;date&gt;        &lt;date&gt;        &lt;date&gt;            &lt;date&gt;           
#&gt; 1 11/09/2016 2016-11-09    2016-09-11    2016-11-09        2016-09-11       
#&gt; 2 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28       
#&gt; 3 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28       
#&gt; 4 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28

<sup>Created on 2020-01-06 by the reprex package (v0.3.0)</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从字符串中提取不同格式的日期在R中。

问题

答案1

答案2

Add columns in certain position to a set of data.frames contained in a list, preferably with mapply

r不将数据框视为字符；无法使用grep；as.character()的使用错误？

“在设置显式唯一行名称后，设置‘row.names’时出现‘非唯一值’错误。”

Mutate case_when 嵌套条件标签

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论