使用正则表达式在文本中捕获大写字母的单词。

huangapple go评论64阅读模式
英文:

Capturing uppercase words in text with regex

问题

我正在尝试找到给定文本中的大写单词。这些单词必须连续出现,并且它们至少要有4个。

我有一个“几乎”工作的代码,但它捕获了更多内容:[A-Z]*(?: +[A-Z]*){4,}。捕获组还包括这些单词开头或结尾的空格(类似于边界)。

如果你想测试一下,我有一个游乐场:https://regex101.com/r/BmXHFP/2

有没有办法让示例中的正则表达式只捕获第一句话中的单词?我使用的语言是Go,它没有look-behind/look-ahead功能。

英文:

I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.

I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).

I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2

Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.

答案1

得分: 2

在你的正则表达式中,你只需要将第二个*更改为+

[A-Z]*(?: +[A-Z]+){4,}

###解释
当使用(?: +[A-Z]*)时,你匹配的是“一个空格后面跟着0个或多个字母”。所以你匹配的是空格。当将*替换为+时,你只会在大写字母后面匹配空格。

##在regex101上的演示

英文:

In your regex, you just need to change the second * for a +:

[A-Z]*(?: +[A-Z]+){4,}

###Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.

##Demo on regex101

答案2

得分: 1

*替换为+,你的正则表达式只匹配第一句话中的单词。

.*也匹配空字符串。忽略[A-Z]*,剩下的只是一串空格。使用+可以确保在每个空格之间至少有一个大写字母。

英文:

Replace the *s by +s, and your regex only matches the words in the first sentence.

.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.

答案3

得分: 1

你必须将至少一个大写字母标记为[A-Z]*(?: +[A-Z]+){4,},请参见更新的正则表达式

更好的正则表达式将允许非空格字符作为[A-Z]*(?: *[A-Z]+){4,}。请参见更好的正则表达式

*在空格后面表示即使没有空格,也要至少允许一个大写字母。

英文:

You had to mark at least 1 upper case as [A-Z]*(?: +[A-Z]+){4,} see updated regex.

A better Regex will allow non spaces as [A-Z]*(?: *[A-Z]+){4,}.see better regex

* After will indicate to allow at least upper case even without spaces.

huangapple
  • 本文由 发表于 2017年8月30日 20:20:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/45960277.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定