英文:
Capturing uppercase words in text with regex
问题
我正在尝试找到给定文本中的大写单词。这些单词必须连续出现,并且它们至少要有4个。
我有一个“几乎”工作的代码,但它捕获了更多内容:[A-Z]*(?: +[A-Z]*){4,}。捕获组还包括这些单词开头或结尾的空格(类似于边界)。
如果你想测试一下,我有一个游乐场:https://regex101.com/r/BmXHFP/2
有没有办法让示例中的正则表达式只捕获第一句话中的单词?我使用的语言是Go,它没有look-behind/look-ahead功能。
英文:
I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.
答案1
得分: 2
在你的正则表达式中,你只需要将第二个*更改为+:
[A-Z]*(?: +[A-Z]+){4,}
###解释
当使用(?: +[A-Z]*)时,你匹配的是“一个空格后面跟着0个或多个字母”。所以你匹配的是空格。当将*替换为+时,你只会在大写字母后面匹配空格。
英文:
In your regex, you just need to change the second * for a +:
[A-Z]*(?: +[A-Z]+){4,}
###Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.
答案2
得分: 1
将*替换为+,你的正则表达式只匹配第一句话中的单词。
.*也匹配空字符串。忽略[A-Z]*,剩下的只是一串空格。使用+可以确保在每个空格之间至少有一个大写字母。
英文:
Replace the *s by +s, and your regex only matches the words in the first sentence.
.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.
答案3
得分: 1
你必须将至少一个大写字母标记为[A-Z]*(?: +[A-Z]+){4,},请参见更新的正则表达式。
更好的正则表达式将允许非空格字符作为[A-Z]*(?: *[A-Z]+){4,}。请参见更好的正则表达式。
*在空格后面表示即使没有空格,也要至少允许一个大写字母。
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论