英文:
Capturing uppercase words in text with regex
问题
我正在尝试找到给定文本中的大写单词。这些单词必须连续出现,并且它们至少要有4个。
我有一个“几乎”工作的代码,但它捕获了更多内容:[A-Z]*(?: +[A-Z]*){4,}
。捕获组还包括这些单词开头或结尾的空格(类似于边界)。
如果你想测试一下,我有一个游乐场:https://regex101.com/r/BmXHFP/2
有没有办法让示例中的正则表达式只捕获第一句话中的单词?我使用的语言是Go
,它没有look-behind/look-ahead功能。
英文:
I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}
. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go
and it has no look-behind/ahead.
答案1
得分: 2
在你的正则表达式中,你只需要将第二个*
更改为+
:
[A-Z]*(?: +[A-Z]+){4,}
###解释
当使用(?: +[A-Z]*)
时,你匹配的是“一个空格后面跟着0个或多个字母”。所以你匹配的是空格。当将*
替换为+
时,你只会在大写字母后面匹配空格。
英文:
In your regex, you just need to change the second *
for a +
:
[A-Z]*(?: +[A-Z]+){4,}
###Explanation
While using (?: +[A-Z]*)
, you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the *
by a +
, you matches spaces if there are uppercase after.
答案2
得分: 1
将*
替换为+
,你的正则表达式只匹配第一句话中的单词。
.*
也匹配空字符串。忽略[A-Z]*
,剩下的只是一串空格。使用+
可以确保在每个空格之间至少有一个大写字母。
英文:
Replace the *
s by +
s, and your regex only matches the words in the first sentence.
.*
also matches the empty string. Looking at you regex and ignoring both [A-Z]*
, all that remains is a sequence of spaces. Using +
makes sure that there is at least one uppercase char between every now and then.
答案3
得分: 1
你必须将至少一个大写字母标记为[A-Z]*(?: +[A-Z]+){4,}
,请参见更新的正则表达式。
更好的正则表达式将允许非空格字符作为[A-Z]*(?: *[A-Z]+){4,}
。请参见更好的正则表达式。
*
在空格后面表示即使没有空格,也要至少允许一个大写字母。
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论