问题

我试图匹配URL模式string.string.，其中任意数量的string.，使用^（[^\\W_]+.）（[^\\W_]+.）$作为第一次尝试，它可以成功匹配两个连续的模式。但是，当我将其推广为^（[^\\W_]+.）+$时，它停止工作并且匹配错误的模式“string.str_ing.”。你知道第二个版本有什么问题吗？

英文:

I was trying to match the URL pattern string.string. for any number of string. using ^([^\\W_]+.)([^\\W_]+.)$ as a first attempt, and it works for matching two consecutive patterns. But then, when I generalize it to ^([^\\W_]+.)+$ stops working and matches the wrong pattern "string.str_ing.".
Do you know what is incorrect with the second version?

答案1

得分: 0

使用 ^([^\\W_]+.)([^\\W_]+.)$ 你可以匹配任意两个由受限字符集组成的单词。尽管你没有转义 .，但只要第一个单词首先匹配到 string，然后是任意的字面值（这就是未转义的 . 的意思），然后再是 string，它仍然可以工作。

在后面的模式中，未转义的点 (.) 是至少出现一次的捕获组的一部分（因为你使用了 +），因此它允许任何字符作为除数。换句话说，string.str_ing. 被理解为：

第1个单词是 string
第2个单词是 str
第3个单词是 ing

...只要未转义的点 (.) 允许任何除数（包括字面上的 . 和 _）。

为了使正则表达式按预期工作，需要转义点号，修改后的正则表达式为 (演示链接)：

^([^\\W_]+\\.)+$

英文:

With ^([^\\W_]+.)([^\\W_]+.)$ you match any two words with restricted set of characters. Although, you have not escaped the ., it still works as long as the first word is matched first string, then any literal (that's what unescaped . means) and then string again.

In the latter one the unescaped dot (.) is a part of the capturing group occurring at least once (since you use +), therefore it allows any character as a divisor. In other words string.str_ing. is understood as:

string as the 1st word
str as the 2nd word
ing as the 3rd word

... as long as the unescaped dot (.) allows any divisor (both . literally and _).

Escape the dot to make the Regex work as intented (demo):

^([^\\W_]+\.)+$

答案2

得分: 0

你需要转义你的 . 字符，否则它将匹配包括 _ 在内的任何字符。

```regexp
^([^\\W_]+\\.?)+$

这可以是你的通用正则表达式


<details>
<summary>英文:</summary>

You need to escape your . character, else it will match any character including _.

^([^\W_]+.?)+$

this can be your generalised regex

</details>



# 答案3
**得分**: 0

[\^\\W]似乎是一个奇怪的选择 - 它匹配的是“非非单词字符”。我还没有仔细思考过，但听起来它等效于\w，即匹配一个单词字符。

无论如何，对于^\W和\w，您都在要求匹配下划线 - 这就是为什么它与包含下划线的字符串匹配。 "单词字符" 包括大写字母、小写字母、数字和下划线。

您可能想要使用[a-z]+或者也许是[A-Za-z0-9]+。

<details>
<summary>英文:</summary>

[^\W] seems a weird choice - it&#39;s matching &#39;not not-a-word-character&#39;.  I haven&#39;t thought it through, but that sounds like it&#39;s equivalent to \w, i.e., matching a word character.

Either way, with ^\W and \w, you&#39;re asking to match underscores - which is why it matches the string with the underscore.  &quot;Word characters&quot; are uppercase alphabetics, lowercase alphabetics, digits, **and underscore**.

You probably want [a-z]+  or maybe [A-Za-z0-9]+





</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么 Java 正则表达式会匹配下划线？

问题

答案1

答案2

春季启动控制器不会重定向

如何在Java中检测macOS笔记本电脑是否处于睡眠状态

JGit：我想获取特定分支中的所有文件和文件夹

LWJGL显示窗口为什么很长

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论