只匹配任何内容之间的小写单词。

huangapple go评论66阅读模式
英文:

Python Regex Match only lower case words between anything

问题

> 我正在尝试捕获任何非单词字符之间的所有英文单词(只是纯粹的[a-z]+)

所以这一行:

> "你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this

将返回:

> means, and, is, you, hello, me, bye, you, what, then, end, of, this

单词可能具有任何环境,但对于a-z

我想出了这个:

(?:[\s_\d"'-])?([a-z]+)(?:[-\s_\d"'])?

但它匹配了'ello'中的Hello,但我想要完全忽略非小写字母的单词

如果模板设计允许使用忽略大小写标志,那么也很好

(?i)(?:[\s_\d"'-])?([a-z]+)(?:[-\s_\d"'])?

轻松切换只匹配小写或全部匹配。

英文:

I am trying to catch all the eng words between any non word characters (just pure [a-z]+)

So this line:

> "你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this

would return:

> means, and, is, you, hello, me, bye, you, what, then, end, of, this

words may have any surroundings but for a-z

I have come up with this:

(?:[\s_\d\"\'-])?([a-z]+)(?:[-\s_\d\"\'])?

yet it matches 'ello' in Hello but i want to ignore non lower case words fully

also would be nice if a template design allows using ignore case flag then

(?i)(?:[\s_\d\"\'-])?([a-z]+)(?:[-\s_\d\"\'])?

to easily switch between lower case matches only and all of them

答案1

得分: 1

这在这个例子中有效。 可能缺少一些边缘情况...

英文:

This works for the example. Might be missing some edge case...

import re

s='"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'
print(re.findall(r'(?:\b|[^A-Za-z]|(?<=_))([a-z]+)(?:\b|[^A-Za-z]|(?=_))', s))

Output:

['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']

答案2

得分: 1

这似乎并不困难。
你描述了一个由 `(?<![a-zA-Z])` 表示的自定义单词边界,之前是 `(?![a-zA-Z])`。

`(?<![a-zA-Z])[a-z]+(?![a-zA-Z])`

代码示例

import re

s='''"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'''
print(re.findall(r'(?<![a-zA-Z])[a-z]+(?![a-zA-Z])', s))

输出

['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']
英文:

This doesn't appear to be difficult.
You've described a custom word boundary denoted by (?<![a-zA-Z]) before
and (?![a-zA-Z]) after.

(?<![a-zA-Z])[a-z]+(?![a-zA-Z])

Code sample

import re

s='"你好" means "Hello" and "再见" is "See you".hello_me bye-you what6 58then end_of_this'
print(re.findall(r'(?<![a-zA-Z])[a-z]+(?![a-zA-Z])', s))

Output

['means', 'and', 'is', 'you', 'hello', 'me', 'bye', 'you', 'what', 'then', 'end', 'of', 'this']

huangapple
  • 本文由 发表于 2023年6月22日 07:49:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76527805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定