英文:
Python Regex Match only lower case words between anything
问题
> 我正在尝试捕获任何非单词字符之间的所有英文单词(只是纯粹的[a-z]+)
所以这一行:
> "你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this
将返回:
> means, and, is, you, hello, me, bye, you, what, then, end, of, this
单词可能具有任何环境,但对于a-z
我想出了这个:
(?:[\s_\d"'-])?([a-z]+)(?:[-\s_\d"'])?
但它匹配了'ello'中的Hello,但我想要完全忽略非小写字母的单词
如果模板设计允许使用忽略大小写标志,那么也很好
(?i)(?:[\s_\d"'-])?([a-z]+)(?:[-\s_\d"'])?
轻松切换只匹配小写或全部匹配。
英文:
I am trying to catch all the eng words between any non word characters (just pure [a-z]+)
So this line:
> "你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this
would return:
> means, and, is, you, hello, me, bye, you, what, then, end, of, this
words may have any surroundings but for a-z
I have come up with this:
(?:[\s_\d\"\'-])?([a-z]+)(?:[-\s_\d\"\'])?
yet it matches 'ello' in Hello but i want to ignore non lower case words fully
also would be nice if a template design allows using ignore case flag then
(?i)(?:[\s_\d\"\'-])?([a-z]+)(?:[-\s_\d\"\'])?
to easily switch between lower case matches only and all of them
答案1
得分: 1
这在这个例子中有效。 可能缺少一些边缘情况...
英文:
This works for the example. Might be missing some edge case...
import re
s='"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'
print(re.findall(r'(?:\b|[^A-Za-z]|(?<=_))([a-z]+)(?:\b|[^A-Za-z]|(?=_))', s))
Output:
['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']
答案2
得分: 1
这似乎并不困难。
你描述了一个由 `(?<![a-zA-Z])` 表示的自定义单词边界,之前是 `(?![a-zA-Z])`。
`(?<![a-zA-Z])[a-z]+(?![a-zA-Z])`
代码示例
import re
s='''"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'''
print(re.findall(r'(?<![a-zA-Z])[a-z]+(?![a-zA-Z])', s))
输出
['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']
英文:
This doesn't appear to be difficult.
You've described a custom word boundary denoted by (?<![a-zA-Z])
before
and (?![a-zA-Z])
after.
(?<![a-zA-Z])[a-z]+(?![a-zA-Z])
Code sample
import re
s='"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'
print(re.findall(r'(?<![a-zA-Z])[a-z]+(?![a-zA-Z])', s))
Output
['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论