英文:
Is it possible to have an optional group between two "any" groups in regex?
问题
我有一个正则表达式语句,看起来是这样的:
(.*?)_(ce_)?(.*?)_([0-9]{8}).([A-Za-z]{1,20})(?:\.[A-Za-z]{1,20})?
它应该对(任意内容)_(ce_)?(任意内容)_(一些数字)_(一些扩展名)_(可能的扩展名)
进行分组。
所以,这是一个可能匹配的字符串:
hello_ce_world_20192212.json.xml
分组如下:
1. hello
2. ce
3. world
4. 20192212
5. json
6. xml
我现在想要使 (ce_)
部分变成可选的,并且使正则表达式如下:
(.*?)_(ce_)?(.*?)_([0-9]{8}).([A-Za-z]{1,20})(?:\.[A-Za-z]{1,20})?
这样可以匹配: hello_ce_world_20192212.json.xml
,并且分组如下:
1. hello
2. ce
3. world
4. 20192212
5. json
6. xml
以及可以匹配: hello_world_20192212.json.xml
,并且分组如下:
1. hello
3. world
4. 20192212
5. json
6. xml
所以,正则表达式起作用了!问题是,当 (ce_) 存在于正在评估的文本中时,它会被包含在第一组中。因此,hello_ce_world_20192212.json.xml
符合正则表达式,但分组如下:
1. hello_ce
3. world
4. 20192212
5. json
6. xml
这违反了我上面提到的约束。不确定如何修复正则表达式以使其符合要求;我怀疑是因为它位于两个 (.*?)
组之间,所以它不起作用,或者我的正则表达式需要更加具体。仔细思考逻辑让我理解,要实现我想要的可能是不太可能的... 但也许有人会更了解。任何帮助吗?
我发现这个 网站 对于测试分组的位置等非常有帮助。
英文:
I have a regex statement that looks like this:
(.*)_(ce)_(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?
It's supposed to group (anything)_(ce)_(anything)_(some digits).(some_ext).(some_possible_ext)
.
So, this is a possible passing string:
hello_ce_world_20192212.json.xml
.
The groups are:
1. hello
2. ce
3. world
4. 20192212
5. json
6. xml
I now want to make the (ce) optional, and make the regex look like this:
(.*)_(ce_)?(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?
Such that this would pass: hello_ce_world_20192212.json.xml
, and the groups would be:
1. hello
2. ce
3. world
4. 20192212
5. json
6. xml
And this would pass: hello_world_20192212.json.xml
, and the groups would be:
1. hello
3. world
4. 20192212
5. json
6. xml
So, the regex works! The problem is, when (ce_) is present in the text being evaluated, it is included in group one. So, hello_ce_world_20192212.json.xml
passes the regex, but the groups are:
1. hello_ce
3. world
4. 20192212
5. json
6. xml
This violate the constraint I mentioned above. Not sure how to fix the regex to have it do this; I suspect because it is in between two (.*)
groups, it won't work OR my regex needs to be more specific. Just thinking about it logically makes me understand that it's unlikely I can achieve what I want... but maybe someone out there has more understanding. Any help?
I have found this website helpful for testing out what groups are where and stuff.
答案1
得分: 4
你可以通过在第一个组匹配中使用 ?
来使其变为非贪婪匹配。以下正则表达式应该能够满足你的需求:
(.*?)_(ce)?_?(.*)_([0-9]{8})\.([A-Za-z]{1,20})?\.([A-Za-z]{1,20})?
在 https://regex101.com/r/MZqDPd/3 进行的测试中验证过。
还要注意对正则表达式进行的调整,使 ce
成为可选项,并且不带下划线进行捕获。这将适用于可能缺失其中一个部分的情况,但仍然能够通过正则表达式匹配。请注意这一点。
英文:
You can make the first group capture a non-greedy one with the ?
. This regex should do what you need:
(.*?)_(ce)?_?(.*)_([0-9]{8})\.([A-Za-z]{1,20})?\.([A-Za-z]{1,20})?
as tested in https://regex101.com/r/MZqDPd/3
Also note the adjustments to make ce
optional yet captured, without the _
. This opens up to cases where either might be missing and still pass the regex. Be aware of this.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论