英文:
Extract dates in various formats from string in R
问题
我需要快速从字符向量中提取日期。
我有两个主要问题:
- 各种日期格式(欧洲和美国,字母和数字...)
- 每个向量中有多个日期。
我的向量如下所示:
c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 ")
我尝试过使用parse_date
和strptime
但没有成功。
我不了解正则表达式语法,也没有时间深入研究。
非常感谢你的帮助。
英文:
I need to quickly extract dates from character vectors.
I have 2 main issues:
- Various date formats (European and American, alphanumeric and numeric...)
- Multiple dates in each vector.
My vectors are something as follows:
c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 ")
I have tried using parse_date
and strptime
without success.
I do not know anything about the regex syntax and do not really have time to dig into it.
Warmly thank you for your help.
答案1
得分: 2
We can use str_extract_all
to extract all the dates with a pattern of two digits followed by /
, followed by two digits, /
and then four digits
library(stringr)
str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]]
###data
v1 <- c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 ")
英文:
We can use str_extract_all
to extract all the dates with a pattern of two digits followed by /
, followed by two digits, /
and then four digits
library(stringr)
str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]]
###data
v1 <- c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 ")
答案2
得分: 2
如果你需要R语言中的日期,你需要选择更看重美国日期格式还是欧洲日期格式。
library(tidyverse)
library(lubridate)
v1 <- c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00")
str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]] %>%
tibble(value = .) %>%
mutate(american_date = value %>% mdy,
european_date = value %>% dmy,
stronger_american = coalesce(american_date,european_date),
stronger_european = coalesce(european_date,american_date))
警告:有3个日期无法解析。
以下是代码的输出结果:
# A tibble: 4 x 5
value american_date european_date stronger_american stronger_european
<chr> <date> <date> <date> <date>
1 11/09/2016 2016-11-09 2016-09-11 2016-11-09 2016-09-11
2 10/28/2016 2016-10-28 NA 2016-10-28 2016-10-28
3 10/28/2016 2016-10-28 NA 2016-10-28 2016-10-28
4 10/28/2016 2016-10-28 NA 2016-10-28 2016-10-28
创建日期:2020-01-06,使用了reprex包 (v0.3.0)。
英文:
If you need R dates, you will need to choose if you value more American or European dates
<!-- language-all: lang-r -->
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
v1 <- c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00")
str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]] %>%
tibble(value = .) %>%
mutate(american_date = value %>% mdy,
european_date = value %>% dmy,
stronger_american = coalesce(american_date,european_date),
stronger_european = coalesce(european_date,american_date))
#> Warning: 3 failed to parse.
#> # A tibble: 4 x 5
#> value american_date european_date stronger_american stronger_european
#> <chr> <date> <date> <date> <date>
#> 1 11/09/2016 2016-11-09 2016-09-11 2016-11-09 2016-09-11
#> 2 10/28/2016 2016-10-28 NA 2016-10-28 2016-10-28
#> 3 10/28/2016 2016-10-28 NA 2016-10-28 2016-10-28
#> 4 10/28/2016 2016-10-28 NA 2016-10-28 2016-10-28
<sup>Created on 2020-01-06 by the reprex package (v0.3.0)</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论