英文:
extracting words broken up by white space & some specific characters
问题
我正在尝试从字符串中提取“单词”,具体是“日期”字符串组件。
oct 12:30
2023 09:05 04
%yyyy %hh:%ii %mm
mar 2, 1945 * 匹配“2,”而不是2“
mar 2,1945 * 匹配“2,1945”而不是“2”“1945”
mar2,1945 * 理想情况下,“mar2”应该是“mar”和“2”
01-02-03
04:05:06
我 认为 我相当接近:
((^|%|[0-9]).+?(?=[,:]|\W|$))
但这会将“2,1945”提取为一个项目。
我尝试过((^|%|[0-9]).+?(?=[[^,]:]|\W|$))
,但这一点也没有帮助。
基本上,我需要每个由空格或非字母数字字符分隔的单词,以及它们是否打破字母/数字模式(例如:mar2应该分别匹配mar和2)。
英文:
I'm trying to extract 'words' from a string, specifically 'date' string components.
oct 12:30
2023 09:05 04
%yyyy %hh:%ii %mm
mar 2, 1945 * matches "2," instead of 2"
mar 2,1945 * matches "2,1945" instead of "2" "1945"
mar2,1945 * ideally, "mar2" should be "mar" "2"
01-02-03
04:05:06
I think I'm pretty close:
((^|%|[0-9]).+?(?=[,:]|\W|$))
but this is extracting "2,1945" as one item.
I tried ((^|%|[0-9]).+?(?=[[^,]:]|\W|$))
but that didn't help at all.
basically, I need every word broken up by white space, or non alphanumeric characters. (ie: :/- etc) as well as if they break the alpha/numeric pattern (ie: mar2 should match mar and 2 separately)
答案1
得分: 0
(\d{1,4}|\w{1,10}|%\w{1,4})
\d{1,4} 匹配2到4位数字(适用于所有数字)
或
\w{1,10} 匹配1到10个字符(适用于所有月份)
或
%\w{1,4} 匹配以%开头的2-4个字符
mar2,1945 -> mar 2 1945
但如果你不想匹配%5
,请将\w
更改为[a-zA-Z]
。
英文:
(\d{1,4}|\w{1,10}|%\w{1,4})
\d{1,4} match number 2 to 4 digits (for all numbers)
or
\w{1,10} match 1 to 10 characters (for all months)
or
%\w{1,4} match 2-4 characters start with %
mar2,1945 -> mar 2 1945
But %5
is matched if you don't want it; change \w
to [a-zA-Z]
instead.
答案2
得分: 0
根据提供的示例组合,我建议使用以下正则表达式:
```%?[a-zA-Z]+|%?\d+[a-zA-Z]*```
它将匹配可选的百分号后跟字母,或者数字和可选的字母。
示例:
```none
oct 12:30 : ['oct', '12', '30']
2023 09:05 04 : ['2023', '09', '05', '04']
%yyyy %hh:%ii %mm : ['%yyyy', '%hh', '%ii', '%mm']
mar 2, 1945 : ['mar', '2', '1945']
mar 2,1945 : ['mar', '2', '1945']
mar2,1945 : ['mar', '2', '1945']
01-02-03 : ['01', '02', '03']
04:05:06 : ['04', '05', '06']
10th of April, 2023 : ['10th', 'of', 'April', '2023']
%d%Od of %MM, %yyyy : ['%d', '%Od', 'of', '%MM', '%yyyy']
演示请点击这里。
<details>
<summary>英文:</summary>
It is not entirely clear what input could de provided, so I'm partially guessing here.
Based on combination of provided examples I would suggest to use this:
%?[a-zA-Z]+|%?\d+[a-zA-Z]*
It will match optional `%` followed by letters, or numbers and optional letters.
Example:
```none
oct 12:30 : ['oct', '12', '30']
2023 09:05 04 : ['2023', '09', '05', '04']
%yyyy %hh:%ii %mm : ['%yyyy', '%hh', '%ii', '%mm']
mar 2, 1945 : ['mar', '2', '1945']
mar 2,1945 : ['mar', '2', '1945']
mar2,1945 : ['mar', '2', '1945']
01-02-03 : ['01', '02', '03']
04:05:06 : ['04', '05', '06']
10th of April, 2023 : ['10th', 'of', 'April', '2023']
%d%Od of %MM, %yyyy : ['%d', '%Od', 'of', '%MM', '%yyyy']
Demo here.
答案3
得分: 0
你可以尝试这个正则表达式,它有3个捕获组:
([a-zA-Z]+)[ ,](\d+),\s(\d{4})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论