2023年6月22日 07:49:43go评论95阅读模式

英文:

Python Regex Match only lower case words between anything

问题

> 我正在尝试捕获任何非单词字符之间的所有英文单词（只是纯粹的[a-z]+）

所以这一行：

> "你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this

将返回：

> means, and, is, you, hello, me, bye, you, what, then, end, of, this

单词可能具有任何环境，但对于a-z

我想出了这个：

(?:[\s_\d"'-])?([a-z]+)(?:[-\s_\d"'])?

但它匹配了'ello'中的Hello，但我想要完全忽略非小写字母的单词

如果模板设计允许使用忽略大小写标志，那么也很好

(?i)(?:[\s_\d"'-])?([a-z]+)(?:[-\s_\d"'])?

轻松切换只匹配小写或全部匹配。

英文:

I am trying to catch all the eng words between any non word characters (just pure [a-z]+)

So this line:

> "你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this

would return:

> means, and, is, you, hello, me, bye, you, what, then, end, of, this

words may have any surroundings but for a-z

I have come up with this:

(?:[\s_\d\&quot;\&#39;-])?([a-z]+)(?:[-\s_\d\&quot;\&#39;])?

yet it matches 'ello' in Hello but i want to ignore non lower case words fully

also would be nice if a template design allows using ignore case flag then

(?i)(?:[\s_\d\&quot;\&#39;-])?([a-z]+)(?:[-\s_\d\&quot;\&#39;])?

to easily switch between lower case matches only and all of them

答案1

得分: 1

这在这个例子中有效。可能缺少一些边缘情况...

英文:

This works for the example. Might be missing some edge case...

import re
s=&#39;&quot;你好&quot; means &quot;Hello&quot; and &quot;再见&quot; is &quot;See you&quot;.hello_me bye-you what6 58then end_of_this&#39;
print(re.findall(r&#39;(?:\b|[^A-Za-z]|(?&lt;=_))([a-z]+)(?:\b|[^A-Za-z]|(?=_))&#39;, s))

Output:

[&#39;means&#39;, &#39;and&#39;, &#39;is&#39;, &#39;you&#39;, &#39;hello&#39;, &#39;me&#39;, &#39;bye&#39;, &#39;you&#39;, &#39;what&#39;, &#39;then&#39;, &#39;end&#39;, &#39;of&#39;, &#39;this&#39;]

答案2

得分: 1

这似乎并不困难。
你描述了一个由 `(?&lt;![a-zA-Z])` 表示的自定义单词边界，之前是 `(?![a-zA-Z])`。
`(?&lt;![a-zA-Z])[a-z]+(?![a-zA-Z])`
代码示例
import re
s='''"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'''
print(re.findall(r'(?&lt;![a-zA-Z])[a-z]+(?![a-zA-Z])', s))

输出

['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']

英文:

This doesn't appear to be difficult.
You've described a custom word boundary denoted by (?<![a-zA-Z]) before
and (?![a-zA-Z]) after.

(?<![a-zA-Z])[a-z]+(?![a-zA-Z])

Code sample

import re
s=&#39;&quot;你好&quot; means &quot;Hello&quot; and &quot;再见&quot; is &quot;See you&quot;.hello_me bye-you what6 58then end_of_this&#39;
print(re.findall(r&#39;(?&lt;![a-zA-Z])[a-z]+(?![a-zA-Z])&#39;, s))

Output

[&#39;means&#39;, &#39;and&#39;, &#39;is&#39;, &#39;you&#39;, &#39;hello&#39;, &#39;me&#39;, &#39;bye&#39;, &#39;you&#39;, &#39;what&#39;, &#39;then&#39;, &#39;end&#39;, &#39;of&#39;, &#39;this&#39;]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

只匹配任何内容之间的小写单词。

问题

答案1

答案2

Shapely 多边形质心负值

如何将文件中的所有项设置为字典中的内容

在Python中，我可以从字符串中获取枚举值和枚举类名吗？

My python ‘if’ statement is causing a syntax error.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。