2023年4月11日 01:09:39go评论53阅读模式

英文:

extracting words broken up by white space & some specific characters

问题

我正在尝试从字符串中提取“单词”，具体是“日期”字符串组件。

oct 12:30 
2023 09:05 04 
%yyyy %hh:%ii %mm 
mar 2, 1945 * 匹配“2,”而不是2“ 
mar 2,1945 * 匹配“2,1945”而不是“2”“1945” 
mar2,1945 * 理想情况下，“mar2”应该是“mar”和“2” 
01-02-03 
04:05:06

我认为我相当接近：
((^|%|[0-9]).+?(?=[,:]|\W|$))

但这会将“2,1945”提取为一个项目。
我尝试过((^|%|[0-9]).+?(?=[[^,]:]|\W|$))，但这一点也没有帮助。

基本上，我需要每个由空格或非字母数字字符分隔的单词，以及它们是否打破字母/数字模式（例如：mar2应该分别匹配mar和2）。

英文:

I'm trying to extract 'words' from a string, specifically 'date' string components.

oct 12:30
2023 09:05 04
%yyyy %hh:%ii %mm
mar 2, 1945 * matches &quot;2,&quot; instead of 2&quot;
mar 2,1945  * matches &quot;2,1945&quot; instead of &quot;2&quot; &quot;1945&quot;
mar2,1945   * ideally, &quot;mar2&quot; should be &quot;mar&quot; &quot;2&quot; 
01-02-03
04:05:06

I think I'm pretty close:
((^|%|[0-9]).+?(?=[,:]|\W|$))

but this is extracting "2,1945" as one item.
I tried ((^|%|[0-9]).+?(?=[[^,]:]|\W|$)) but that didn't help at all.

basically, I need every word broken up by white space, or non alphanumeric characters. (ie: :/- etc) as well as if they break the alpha/numeric pattern (ie: mar2 should match mar and 2 separately)

答案1

得分: 0

(\d{1,4}|\w{1,10}|%\w{1,4})

\d{1,4} 匹配2到4位数字（适用于所有数字）
或
\w{1,10} 匹配1到10个字符（适用于所有月份）
或
%\w{1,4} 匹配以%开头的2-4个字符

mar2,1945 -&gt; mar 2 1945

但如果你不想匹配%5，请将\w更改为[a-zA-Z]。

英文:

(\d{1,4}|\w{1,10}|%\w{1,4})

\d{1,4} match number 2 to 4 digits (for all numbers)
or 
\w{1,10} match 1 to 10 characters (for all months)
or 
%\w{1,4} match 2-4 characters start with %

mar2,1945 -&gt; mar 2 1945

But %5 is matched if you don't want it; change \w to [a-zA-Z] instead.

答案2

得分: 0

根据提供的示例组合，我建议使用以下正则表达式：

```%?[a-zA-Z]+|%?\d+[a-zA-Z]*```

它将匹配可选的百分号后跟字母，或者数字和可选的字母。

示例：
```none
oct 12:30 : ['oct', '12', '30']
2023 09:05 04 : ['2023', '09', '05', '04']
%yyyy %hh:%ii %mm : ['%yyyy', '%hh', '%ii', '%mm']
mar 2, 1945 : ['mar', '2', '1945']
mar 2,1945 : ['mar', '2', '1945']
mar2,1945 : ['mar', '2', '1945']
01-02-03 : ['01', '02', '03']
04:05:06 : ['04', '05', '06']
10th of April, 2023 : ['10th', 'of', 'April', '2023']
%d%Od of %MM, %yyyy : ['%d', '%Od', 'of', '%MM', '%yyyy']

演示请点击这里。


<details>
<summary>英文:</summary>

It is not entirely clear what input could de provided, so I&#39;m partially guessing here.

Based on combination of provided examples I would suggest to use this:

%?[a-zA-Z]+|%?\d+[a-zA-Z]*

It will match optional `%` followed by letters, or numbers and optional letters.

Example:
```none
oct 12:30 : [&#39;oct&#39;, &#39;12&#39;, &#39;30&#39;]
2023 09:05 04 : [&#39;2023&#39;, &#39;09&#39;, &#39;05&#39;, &#39;04&#39;]
%yyyy %hh:%ii %mm : [&#39;%yyyy&#39;, &#39;%hh&#39;, &#39;%ii&#39;, &#39;%mm&#39;]
mar 2, 1945 : [&#39;mar&#39;, &#39;2&#39;, &#39;1945&#39;]
mar 2,1945 : [&#39;mar&#39;, &#39;2&#39;, &#39;1945&#39;]
mar2,1945 : [&#39;mar&#39;, &#39;2&#39;, &#39;1945&#39;]
01-02-03 : [&#39;01&#39;, &#39;02&#39;, &#39;03&#39;]
04:05:06 : [&#39;04&#39;, &#39;05&#39;, &#39;06&#39;]
10th of April, 2023 : [&#39;10th&#39;, &#39;of&#39;, &#39;April&#39;, &#39;2023&#39;]
%d%Od of %MM, %yyyy : [&#39;%d&#39;, &#39;%Od&#39;, &#39;of&#39;, &#39;%MM&#39;, &#39;%yyyy&#39;]

Demo here.

答案3

得分: 0

你可以尝试这个正则表达式，它有3个捕获组：

([a-zA-Z]+)[ ,](\d+),\s(\d{4})

演示在这里

英文:

You can try this regex with 3 capturing groups :

([a-zA-Z]+)[ ,]*(\d+)\,\s*(\d{4})

Demo here

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提取由空格和某些特定字符分隔的单词。

问题

答案1

答案2

答案3

从包含空格但不包含制表符或换行符的HTML代码中提取文本内容如何？

挑战性的正则表达式问题，以解决医疗结果摘要。

为什么在使用正则表达式时，Java的replaceAll()需要在前面添加"\\"？

Java PathMatcher在Windows上无法正常工作

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论