2017年8月30日 20:20:42go评论77阅读模式

英文:

Capturing uppercase words in text with regex

问题

我正在尝试找到给定文本中的大写单词。这些单词必须连续出现，并且它们至少要有4个。

我有一个“几乎”工作的代码，但它捕获了更多内容：[A-Z]*(?: +[A-Z]*){4,}。捕获组还包括这些单词开头或结尾的空格（类似于边界）。

如果你想测试一下，我有一个游乐场：https://regex101.com/r/BmXHFP/2

有没有办法让示例中的正则表达式只捕获第一句话中的单词？我使用的语言是Go，它没有look-behind/look-ahead功能。

英文:

I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.

I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).

I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2

Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.

答案1

得分: 2

在你的正则表达式中，你只需要将第二个*更改为+：

[A-Z]*(?: +[A-Z]+){4,}

###解释
当使用(?: +[A-Z]*)时，你匹配的是“一个空格后面跟着0个或多个字母”。所以你匹配的是空格。当将*替换为+时，你只会在大写字母后面匹配空格。

##在regex101上的演示

英文:

In your regex, you just need to change the second * for a +:

[A-Z]*(?: +[A-Z]+){4,}

###Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.

##Demo on regex101

答案2

得分: 1

将*替换为+，你的正则表达式只匹配第一句话中的单词。

.*也匹配空字符串。忽略[A-Z]*，剩下的只是一串空格。使用+可以确保在每个空格之间至少有一个大写字母。

英文:

Replace the *s by +s, and your regex only matches the words in the first sentence.

.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.

答案3

得分: 1

你必须将至少一个大写字母标记为[A-Z]*(?: +[A-Z]+){4,}，请参见更新的正则表达式。

更好的正则表达式将允许非空格字符作为[A-Z]*(?: *[A-Z]+){4,}。请参见更好的正则表达式。

*在空格后面表示即使没有空格，也要至少允许一个大写字母。

英文:

You had to mark at least 1 upper case as [A-Z]*(?: +[A-Z]+){4,} see updated regex.

A better Regex will allow non spaces as [A-Z]*(?: *[A-Z]+){4,}.see better regex

* After will indicate to allow at least upper case even without spaces.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正则表达式在文本中捕获大写字母的单词。

问题

答案1

答案2

答案3

将特定格式的字符串转换为map[string]string。

How can I parse []int JSON data in Go?

连接被拒绝：拨号tcp 127.0.0.1:8080: 连接被拒绝。使用Docker的Go应用程序。

How to find every week dates between startDate and endDate passing string like " Wed, Thu" which is dynamically passed

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论