正则表达式:多行,非贪婪匹配,直到可选字符串。

huangapple go评论87阅读模式
英文:

regexp: multiline, non-greedy match until optional string

问题

使用Go的正则表达式,我正在尝试从原始文本中提取一组预定义的有序键值(多行)对,其中最后一个元素可能是可选的。例如,

Key1:
 SomeValue1
 MoreValue1
Key2:
 SomeValue2
 MoreValue2
OptionalKey3:
 SomeValue3
 MoreValue3

(在这里,我想将所有值作为命名组提取出来)

如果我使用默认的贪婪模式(?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?),它永远不会看到OptionalKey3,并将剩余的文本匹配为Key2。

如果我使用非贪婪模式(?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?),它甚至看不到SomeValue2,并立即停止匹配:https://regex101.com/r/QE2g3o/1

有没有办法在可选匹配OptionalKey3的同时,也能捕获所有其他的值?

英文:

Using Go's regexp, I'm trying to extract a predefined set of ordered key-value (multiline) pairs whose last element may be optional from a raw text, e.g.,

 Key1:
  SomeValue1
  MoreValue1
 Key2:
  SomeValue2
  MoreValue2
 OptionalKey3:
  SomeValue3
  MoreValue3

(here, I want to extract all the values as named groups)

If I use the default greedy pattern (?s:Key1:\n(?P&lt;Key1&gt;.*)Key2:\n(?P&lt;Key2&gt;.*)(?:OptionalKey3:\n(?P&lt;OptionalKey3&gt;.*))?), it never sees OptionalKey3 and matches the rest of the text as Key2.

If I use the non-greedy pattern (?s:Key1:\n(?P&lt;Key1&gt;.*)Key2:\n(?P&lt;Key2&gt;.*?)(?:OptionalKey3:\n(?P&lt;OptionalKey3&gt;.*))?), it doesn't even see SomeValue2 and stops immediately: https://regex101.com/r/QE2g3o/1

Is there a way to optionally match OptionalKey3 while also able to capture all the other ones?

答案1

得分: 2

使用

(?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z

参见正则表达式验证

解释

--------------------------------------------------------------------------------
  (?s)                     设置此块的标志使用.匹配\n)(区分大小写)(使用^和$
                           正常匹配)(正常匹配空格和#)
--------------------------------------------------------------------------------
  \A                       字符串的开头
--------------------------------------------------------------------------------
  Key1:                    'Key1:'
--------------------------------------------------------------------------------
  \n                       '\n'换行符
--------------------------------------------------------------------------------
  (?P<Key1>                 分组并捕获到Key1”:
--------------------------------------------------------------------------------
    .*                       任意字符0次或多次匹配最多的次数))
--------------------------------------------------------------------------------
  )                        结束Key1
--------------------------------------------------------------------------------
  Key2:                    'Key2:'
--------------------------------------------------------------------------------
  \n                       '\n'换行符
--------------------------------------------------------------------------------
  (?P<Key2>                分组并捕获到Key2”:
--------------------------------------------------------------------------------
    .*?                      任意字符0次或多次匹配最少的次数))
--------------------------------------------------------------------------------
  )                        结束Key2
--------------------------------------------------------------------------------
  (?:                      分组但不捕获可选的匹配最多的次数)):
--------------------------------------------------------------------------------
    OptionalKey3:            'OptionalKey3:'
--------------------------------------------------------------------------------
    \n                       '\n'换行符
--------------------------------------------------------------------------------
    (?P<OptionalKey3>         分组并捕获到OptionalKey3”:
--------------------------------------------------------------------------------
      .*                       任意字符0次或多次匹配最多的次数))
--------------------------------------------------------------------------------
    )                        结束OptionalKey3
--------------------------------------------------------------------------------
  )?                       结束分组
--------------------------------------------------------------------------------
  \z                       字符串的结尾
英文:

Use

(?s)\AKey1:\n(?P&lt;Key1&gt;.*)Key2:\n(?P&lt;Key2&gt;.*?)(?:OptionalKey3:\n(?P&lt;OptionalKey3&gt;.*))?\z

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  \A                       the beginning of the string
--------------------------------------------------------------------------------
  Key1:                    &#39;Key1:&#39;
--------------------------------------------------------------------------------
  \n                       &#39;\n&#39; (newline)
--------------------------------------------------------------------------------
  (?P&lt;Key1&gt;                 group and capture to &quot;Key1&quot;:
--------------------------------------------------------------------------------
    .*                       any character (0 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )                        end of &quot;Key1&quot;
--------------------------------------------------------------------------------
  Key2:                    &#39;Key2:&#39;
--------------------------------------------------------------------------------
  \n                       &#39;\n&#39; (newline)
--------------------------------------------------------------------------------
  (?P&lt;Key2&gt;                group and capture to &quot;Key2&quot;:
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of &quot;Key2&quot;
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    OptionalKey3:            &#39;OptionalKey3:&#39;
--------------------------------------------------------------------------------
    \n                       &#39;\n&#39; (newline)
--------------------------------------------------------------------------------
    (?P&lt;OptionalKey3&gt;         group and capture to &quot;OptionalKey3&quot;:
--------------------------------------------------------------------------------
      .*                       any character (0 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )                        end of &quot;OptionalKey3&quot;
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \z                       the end of the string

huangapple
  • 本文由 发表于 2021年6月24日 06:31:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/68107667.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定