英文:
I need to create a regular expression where 2 groups point to the same pattern
问题
以下是你要翻译的内容:
我有4个文件名
ABC_ALL_20230508T050011.zip
ABC_Intra1_20230508T050011.zip
ABC_Intra2_20230508T050011.zip
ABC_INT_20230508T050011.zip
我正在尝试创建一个正则表达式,用于捕获ExtractId(提取标识)、FileName(文件名)和Date(日期)。ExtractId和FileName需要在中间位置捕获相同的值,例如'ALL'、'Intra1'、'Intra2'或'INT'。
我目前有:
(?<ExtractId>[a-zA-Z]{3})_(?<FileName>[a-zA-Z0-9]*)_(?<FileDate>[0-9A-Z]{8}).*
并且结果是:
ExtractId = ABC
匹配
FileName = ALL
匹配
FileDate = 20230508
我想要的是:
ExtractId = ALL
匹配
FileName = ALL
匹配
FileDate = 20230508
我相信可以通过使用正则表达式的子表达式来实现这一点,其中可以让两个组指向相同的位置,但我以前从未使用过。
谢谢
英文:
I have 4 file names
ABC_ALL_20230508T050011.zip
ABC_Intra1_20230508T050011.zip
ABC_Intra2_20230508T050011.zip
ABC_INT_20230508T050011.zip
I am trying to create a regex that captures ExtractId, FileName and Date. The ExtractId and FileName need to capture the same values in the middle position eg 'ALL', 'Intra1', 'Intra2' or 'INT'
I currently have:
(?<ExtractId>[a-zA-Z]{3})_(?<FileName>[a-zA-Z0-9]*)_(?<FileDate>[0-9A-Z]{8}).*
and results to:
ExtractId = ABC
Matched
FileName = ALL
Matched
FileDate = 20230508
I am after:
ExtractId = ALL
Matched
FileName = ALL
Matched
FileDate = 20230508
I believe there is a way of achieving this using regex sub expression where you can have 2 groups point to the same position but I have never use it before.
Thanks
答案1
得分: 1
以下是您要翻译的内容:
您可以在先行断言中使用命名捕获组:
(?= [a-zA-Z] {3} (?<ExtractId> [a-zA-Z0-9] +))[a-zA-Z] {3} (?<FileName> [a-zA-Z0-9] +)(?<FileDate> [0-9A-Z] {8}).*
解释
(?=
积极的先行断言[a-zA-Z] {3} _
匹配3次 a-zA-Z,然后匹配_
(?<ExtractId> [a-zA-Z0-9] +)_
命名组 Extractid,捕获1个以上的字符 a-zA-Z0-9,然后匹配_
)
关闭先行断言[a-zA-Z] {3} _
匹配3次 a-zA-Z,然后匹配_
(?<FileName> [a-zA-Z0-9] +)
命名组 FileName,捕获1个以上的字符 a-zA-Z0-9_
字面匹配(?<FileDate> [0-9A-Z] {8})
命名组 FileDate,捕获8个字符 0-9A-Z.*
匹配字符串的其余部分(如果您需要的话,否则可以省略此部分)
查看 正则表达式演示。
如果您想将字符串锚定到开头,可以添加 ^
,如下所示:
^(?=[a-zA-Z] {3} (?<ExtractId> [a-zA-Z0-9] +))[a-zA-Z] {3} (?<FileName> [a-zA-Z0-9] +)(?<FileDate> [0-9A-Z] {8}).*
英文:
You can use a named capture group in a lookahead assertion:
(?=[a-zA-Z]{3}_(?<ExtractId>[a-zA-Z0-9]+)_)[a-zA-Z]{3}_(?<FileName>[a-zA-Z0-9]+)_(?<FileDate>[0-9A-Z]{8}).*
Explanation
(?=
Positive lookahead assertion[a-zA-Z]{3}_
Match 3 times a-zA-Z and then match_
(?<ExtractId>[a-zA-Z0-9]+)_
Named group Extractid, capture 1+ chars a-zA-Z0-9 and then match_
)
Close the lookahead[a-zA-Z]{3}_
Match 3 times a-zA-Z and then match_
(?<FileName>[a-zA-Z0-9]+)
Named group FileName, capture 1+ chars a-zA-Z0-9_
Match literally(?<FileDate>[0-9A-Z]{8})
Named group FileDate, capture 8 chars 0-9A-Z.*
Match the rest of the string (if you need that, else you can omit this part)
See a regex demo.
If you want to anchor the strings to the start, you can prepend ^
like:
^(?=[a-zA-Z]{3}_(?<ExtractId>[a-zA-Z0-9]+)_)[a-zA-Z]{3}_(?<FileName>[a-zA-Z0-9]+)_(?<FileDate>[0-9A-Z]{8}).*
答案2
得分: 0
将重叠捕获组中的前一个部分放入一个前瞻断言中,以便为最后一个重叠捕获组留出匹配的缓冲区:
_(?=(?<ExtractId>[^_]+))(?<FileName>[^_]+)_(?<FileDate>[0-9A-Z]{8})
演示:https://regex101.com/r/YhFyIn/2
英文:
You can put the former of the overlapping capture groups in a lookahead assertion to leave the buffer for the last overlapping capture group to match:
_(?=(?<ExtractId>[^_]+))(?<FileName>[^_]+)_(?<FileDate>[0-9A-Z]{8})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论