需要创建一个正则表达式,其中两个组指向相同的模式。

huangapple go评论53阅读模式
英文:

I need to create a regular expression where 2 groups point to the same pattern

问题

以下是你要翻译的内容:

我有4个文件名

    ABC_ALL_20230508T050011.zip
    ABC_Intra1_20230508T050011.zip
    ABC_Intra2_20230508T050011.zip
    ABC_INT_20230508T050011.zip

我正在尝试创建一个正则表达式,用于捕获ExtractId(提取标识)、FileName(文件名)和Date(日期)。ExtractId和FileName需要在中间位置捕获相同的值,例如'ALL'、'Intra1'、'Intra2'或'INT'。

我目前有:

    (?<ExtractId>[a-zA-Z]{3})_(?<FileName>[a-zA-Z0-9]*)_(?<FileDate>[0-9A-Z]{8}).*

并且结果是:

    ExtractId = ABC
    匹配
    FileName = ALL
    匹配
    FileDate = 20230508

我想要的是:

    ExtractId = ALL
    匹配
    FileName = ALL
    匹配
    FileDate = 20230508

我相信可以通过使用正则表达式的子表达式来实现这一点,其中可以让两个组指向相同的位置,但我以前从未使用过。

谢谢
英文:

I have 4 file names

ABC_ALL_20230508T050011.zip
ABC_Intra1_20230508T050011.zip
ABC_Intra2_20230508T050011.zip
ABC_INT_20230508T050011.zip

I am trying to create a regex that captures ExtractId, FileName and Date. The ExtractId and FileName need to capture the same values in the middle position eg 'ALL', 'Intra1', 'Intra2' or 'INT'

I currently have:

(?<ExtractId>[a-zA-Z]{3})_(?<FileName>[a-zA-Z0-9]*)_(?<FileDate>[0-9A-Z]{8}).*

and results to:

ExtractId = ABC
Matched
FileName = ALL
Matched
FileDate = 20230508

I am after:

ExtractId = ALL
Matched
FileName = ALL
Matched
FileDate = 20230508

I believe there is a way of achieving this using regex sub expression where you can have 2 groups point to the same position but I have never use it before.

Thanks

答案1

得分: 1

以下是您要翻译的内容:

您可以在先行断言中使用命名捕获组:

(?= [a-zA-Z] {3} (?<ExtractId> [a-zA-Z0-9] +))[a-zA-Z] {3} (?<FileName> [a-zA-Z0-9] +)(?<FileDate> [0-9A-Z] {8}).*

解释

  • (?= 积极的先行断言
    • [a-zA-Z] {3} _ 匹配3次 a-zA-Z,然后匹配 _
    • (?&lt;ExtractId&gt; [a-zA-Z0-9] +)_ 命名组 Extractid,捕获1个以上的字符 a-zA-Z0-9,然后匹配 _
  • 关闭先行断言
  • [a-zA-Z] {3} _ 匹配3次 a-zA-Z,然后匹配 _
  • (?&lt;FileName&gt; [a-zA-Z0-9] +) 命名组 FileName,捕获1个以上的字符 a-zA-Z0-9
  • _ 字面匹配
  • (?&lt;FileDate&gt; [0-9A-Z] {8}) 命名组 FileDate,捕获8个字符 0-9A-Z
  • .* 匹配字符串的其余部分(如果您需要的话,否则可以省略此部分)

查看 正则表达式演示

如果您想将字符串锚定到开头,可以添加 ^,如下所示:

^(?=[a-zA-Z] {3} (?<ExtractId> [a-zA-Z0-9] +))[a-zA-Z] {3} (?<FileName> [a-zA-Z0-9] +)(?<FileDate> [0-9A-Z] {8}).*

英文:

You can use a named capture group in a lookahead assertion:

(?=[a-zA-Z]{3}_(?&lt;ExtractId&gt;[a-zA-Z0-9]+)_)[a-zA-Z]{3}_(?&lt;FileName&gt;[a-zA-Z0-9]+)_(?&lt;FileDate&gt;[0-9A-Z]{8}).*

Explanation

  • (?= Positive lookahead assertion
    • [a-zA-Z]{3}_ Match 3 times a-zA-Z and then match _
    • (?&lt;ExtractId&gt;[a-zA-Z0-9]+)_ Named group Extractid, capture 1+ chars a-zA-Z0-9 and then match _
  • ) Close the lookahead
  • [a-zA-Z]{3}_ Match 3 times a-zA-Z and then match _
  • (?&lt;FileName&gt;[a-zA-Z0-9]+) Named group FileName, capture 1+ chars a-zA-Z0-9
  • _ Match literally
  • (?&lt;FileDate&gt;[0-9A-Z]{8}) Named group FileDate, capture 8 chars 0-9A-Z
  • .* Match the rest of the string (if you need that, else you can omit this part)

See a regex demo.

If you want to anchor the strings to the start, you can prepend ^ like:

^(?=[a-zA-Z]{3}_(?&lt;ExtractId&gt;[a-zA-Z0-9]+)_)[a-zA-Z]{3}_(?&lt;FileName&gt;[a-zA-Z0-9]+)_(?&lt;FileDate&gt;[0-9A-Z]{8}).*

答案2

得分: 0

将重叠捕获组中的前一个部分放入一个前瞻断言中,以便为最后一个重叠捕获组留出匹配的缓冲区:

_(?=(?&lt;ExtractId&gt;[^_]+))(?&lt;FileName&gt;[^_]+)_(?&lt;FileDate&gt;[0-9A-Z]{8})

演示:https://regex101.com/r/YhFyIn/2

英文:

You can put the former of the overlapping capture groups in a lookahead assertion to leave the buffer for the last overlapping capture group to match:

_(?=(?&lt;ExtractId&gt;[^_]+))(?&lt;FileName&gt;[^_]+)_(?&lt;FileDate&gt;[0-9A-Z]{8})

Demo: https://regex101.com/r/YhFyIn/2

huangapple
  • 本文由 发表于 2023年5月10日 15:46:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76216059.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定