英文:
Create a data frame in R from a string that follows a specific keyword
问题
Date Dividend
05/05/2005 $367.62
05/05/2006 $415.70
英文:
I have a string as such:
"05/05/2005 ANNIVERSARY $367.62 ANNUAL DIVIDEND DECLARED UNDER THE PAIO UP ADDITIONS 20,965 2,203 23,168 | PAID UP ADDITION OPTION. $367.62 PURCHASED PAID UP ADDITIONS OF 2,203 02/15/2006 WITHDRAWAL ($77.50) VALUE OF PAID UP ADDITIONS OF 464 PAID UP ADDITIONS 23,168 (464) 22,704 APPLIED TOWARDS CHECK-O-MATIC PREMIUM DUE 03/05/2006 04/11/2006 05/05/2006 ANNIVERSARY $415.70"
I would like to create a data frame in R to extract the date and dollar amounts after the word ANNIVERSARY for the entire string.
Date Dividend
05/05/2005 $367.62
05/05/2006 $415.70
Thank you in advance.
I tried splitting the string with str_split but don't know where to go from there.
答案1
得分: 1
如果我们只想提取美元金额和日期,我们可以使用 str_extract
与正则表达式查找(或在新版本中使用捕获组)
library(stringr)
library(tibble)
dates <- str_extract_all(str1, "\\d{2}/\\d{2}/\\d{4}(?=\\s+ANNIVERSARY)")[[1]]
amounts <- str_extract_all(str1, "(?<=ANNIVERSARY )\\$[0-9.]+")[[1]]
tibble(dates, amounts)
# A tibble: 2 × 2
dates amounts
<chr> <chr>
1 05/05/2005 $367.62
2 05/05/2006 $415.70
或者另一种选择是提取包含 'ANNIVERSARY' 的子字符串,然后使用 read.table/fread
读取
library(data.table)
fread(text = str_extract_all(str1, "\\S+\\s+ANNIVERSARY\\s+\\S+")[[1]],
header = FALSE, col.names = c("dates", "amounts"), drop = 2)
dates amounts
1: 05/05/2005 $367.62
2: 05/05/2006 $415.70
数据
str1 <- "05/05/2005 ANNIVERSARY $367.62 ANNUAL DIVIDEND DECLARED UNDER THE PAIO UP ADDITIONS 20,965 2,203 23,168 | PAID UP ADDITION OPTION. $367.62 PURCHASED PAID UP ADDITIONS OF 2,203 02/15/2006 WITHDRAWAL ($77.50) VALUE OF PAID UP ADDITIONS OF 464 PAID UP ADDITIONS 23,168 (464) 22,704 APPLIED TOWARDS CHECK-O-MATIC PREMIUM DUE 03/05/2006 04/11/2006 05/05/2006 ANNIVERSARY $415.70"
英文:
If we just want to extract the dollar amounts and date, we may use str_extract
with a regex lookaround (or in the new version with capture group)
library(stringr)
library(tibble)
dates <- str_extract_all(str1, "\\d{2}/\\d{2}/\\d{4}(?=\\s+ANNIVERSARY)")[[1]]
amounts <- str_extract_all(str1, "(?<=ANNIVERSARY )\$[0-9.]+")[[1]]
tibble(dates, amounts)
# A tibble: 2 × 2
dates amounts
<chr> <chr>
1 05/05/2005 $367.62
2 05/05/2006 $415.70
Or another option is to extract the substring containing 'ANNIVERSARY', read with read.table/fread
library(data.table)
fread(text = str_extract_all(str1, "\\S+\\s+ANNIVERSARY\\s+\\S+")[[1]],
header = FALSE, col.names = c("dates", "amounts"), drop = 2)
dates amounts
1: 05/05/2005 $367.62
2: 05/05/2006 $415.70
data
str1 <- "05/05/2005 ANNIVERSARY $367.62 ANNUAL DIVIDEND DECLARED UNDER THE PAIO UP ADDITIONS 20,965 2,203 23,168 | PAID UP ADDITION OPTION. $367.62 PURCHASED PAID UP ADDITIONS OF 2,203 02/15/2006 WITHDRAWAL ($77.50) VALUE OF PAID UP ADDITIONS OF 464 PAID UP ADDITIONS 23,168 (464) 22,704 APPLIED TOWARDS CHECK-O-MATIC PREMIUM DUE 03/05/2006 04/11/2006 05/05/2006 ANNIVERSARY $415.70"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论