在正则表达式中,在两个“任意”组之间是否可能有一个可选的组?

huangapple go评论95阅读模式
英文:

Is it possible to have an optional group between two "any" groups in regex?

问题

我有一个正则表达式语句,看起来是这样的:

(.*?)_(ce_)?(.*?)_([0-9]{8}).([A-Za-z]{1,20})(?:\.[A-Za-z]{1,20})?

它应该对(任意内容)_(ce_)?(任意内容)_(一些数字)_(一些扩展名)_(可能的扩展名)进行分组。

所以,这是一个可能匹配的字符串:

hello_ce_world_20192212.json.xml

分组如下:

1. hello
2. ce
3. world
4. 20192212
5. json
6. xml

我现在想要使 (ce_) 部分变成可选的,并且使正则表达式如下:

(.*?)_(ce_)?(.*?)_([0-9]{8}).([A-Za-z]{1,20})(?:\.[A-Za-z]{1,20})?

这样可以匹配: hello_ce_world_20192212.json.xml,并且分组如下:

1. hello
2. ce
3. world
4. 20192212
5. json
6. xml

以及可以匹配: hello_world_20192212.json.xml,并且分组如下:

1. hello
3. world
4. 20192212
5. json
6. xml

所以,正则表达式起作用了!问题是,当 (ce_) 存在于正在评估的文本中时,它会被包含在第一组中。因此,hello_ce_world_20192212.json.xml 符合正则表达式,但分组如下:

1. hello_ce
3. world
4. 20192212
5. json
6. xml

这违反了我上面提到的约束。不确定如何修复正则表达式以使其符合要求;我怀疑是因为它位于两个 (.*?) 组之间,所以它不起作用,或者我的正则表达式需要更加具体。仔细思考逻辑让我理解,要实现我想要的可能是不太可能的... 但也许有人会更了解。任何帮助吗?

我发现这个 网站 对于测试分组的位置等非常有帮助。

英文:

I have a regex statement that looks like this:

(.*)_(ce)_(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?

It's supposed to group (anything)_(ce)_(anything)_(some digits).(some_ext).(some_possible_ext).

So, this is a possible passing string:

hello_ce_world_20192212.json.xml.

The groups are:

1. hello
2. ce
3. world
4. 20192212
5. json
6. xml

I now want to make the (ce) optional, and make the regex look like this:

(.*)_(ce_)?(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?

Such that this would pass: hello_ce_world_20192212.json.xml, and the groups would be:

1. hello
2. ce
3. world
4. 20192212
5. json
6. xml

And this would pass: hello_world_20192212.json.xml, and the groups would be:

1. hello
3. world
4. 20192212
5. json
6. xml

So, the regex works! The problem is, when (ce_) is present in the text being evaluated, it is included in group one. So, hello_ce_world_20192212.json.xml passes the regex, but the groups are:

1. hello_ce
3. world
4. 20192212
5. json
6. xml

This violate the constraint I mentioned above. Not sure how to fix the regex to have it do this; I suspect because it is in between two (.*) groups, it won't work OR my regex needs to be more specific. Just thinking about it logically makes me understand that it's unlikely I can achieve what I want... but maybe someone out there has more understanding. Any help?

I have found this website helpful for testing out what groups are where and stuff.

答案1

得分: 4

你可以通过在第一个组匹配中使用 ? 来使其变为非贪婪匹配。以下正则表达式应该能够满足你的需求:

(.*?)_(ce)?_?(.*)_([0-9]{8})\.([A-Za-z]{1,20})?\.([A-Za-z]{1,20})?

https://regex101.com/r/MZqDPd/3 进行的测试中验证过。

还要注意对正则表达式进行的调整,使 ce 成为可选项,并且不带下划线进行捕获。这将适用于可能缺失其中一个部分的情况,但仍然能够通过正则表达式匹配。请注意这一点。

英文:

You can make the first group capture a non-greedy one with the ?. This regex should do what you need:

(.*?)_(ce)?_?(.*)_([0-9]{8})\.([A-Za-z]{1,20})?\.([A-Za-z]{1,20})?

as tested in https://regex101.com/r/MZqDPd/3

Also note the adjustments to make ce optional yet captured, without the _. This opens up to cases where either might be missing and still pass the regex. Be aware of this.

huangapple
  • 本文由 发表于 2020年10月26日 14:42:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/64532508.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定