英文:
regexp: multiline, non-greedy match until optional string
问题
使用Go的正则表达式,我正在尝试从原始文本中提取一组预定义的有序键值(多行)对,其中最后一个元素可能是可选的。例如,
Key1:
SomeValue1
MoreValue1
Key2:
SomeValue2
MoreValue2
OptionalKey3:
SomeValue3
MoreValue3
(在这里,我想将所有值作为命名组提取出来)
如果我使用默认的贪婪模式(?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
,它永远不会看到OptionalKey3,并将剩余的文本匹配为Key2。
如果我使用非贪婪模式(?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
,它甚至看不到SomeValue2,并立即停止匹配:https://regex101.com/r/QE2g3o/1
有没有办法在可选匹配OptionalKey3的同时,也能捕获所有其他的值?
英文:
Using Go's regexp, I'm trying to extract a predefined set of ordered key-value (multiline) pairs whose last element may be optional from a raw text, e.g.,
Key1:
SomeValue1
MoreValue1
Key2:
SomeValue2
MoreValue2
OptionalKey3:
SomeValue3
MoreValue3
(here, I want to extract all the values as named groups)
If I use the default greedy pattern (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
, it never sees OptionalKey3 and matches the rest of the text as Key2.
If I use the non-greedy pattern (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
, it doesn't even see SomeValue2 and stops immediately: https://regex101.com/r/QE2g3o/1
Is there a way to optionally match OptionalKey3 while also able to capture all the other ones?
答案1
得分: 2
使用
(?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z
参见正则表达式验证。
解释
--------------------------------------------------------------------------------
(?s) 设置此块的标志(使用.匹配\n)(区分大小写)(使用^和$
正常匹配)(正常匹配空格和#)
--------------------------------------------------------------------------------
\A 字符串的开头
--------------------------------------------------------------------------------
Key1: 'Key1:'
--------------------------------------------------------------------------------
\n '\n'(换行符)
--------------------------------------------------------------------------------
(?P<Key1> 分组并捕获到“Key1”:
--------------------------------------------------------------------------------
.* 任意字符(0次或多次(匹配最多的次数))
--------------------------------------------------------------------------------
) 结束“Key1”
--------------------------------------------------------------------------------
Key2: 'Key2:'
--------------------------------------------------------------------------------
\n '\n'(换行符)
--------------------------------------------------------------------------------
(?P<Key2> 分组并捕获到“Key2”:
--------------------------------------------------------------------------------
.*? 任意字符(0次或多次(匹配最少的次数))
--------------------------------------------------------------------------------
) 结束“Key2”
--------------------------------------------------------------------------------
(?: 分组,但不捕获(可选的(匹配最多的次数)):
--------------------------------------------------------------------------------
OptionalKey3: 'OptionalKey3:'
--------------------------------------------------------------------------------
\n '\n'(换行符)
--------------------------------------------------------------------------------
(?P<OptionalKey3> 分组并捕获到“OptionalKey3”:
--------------------------------------------------------------------------------
.* 任意字符(0次或多次(匹配最多的次数))
--------------------------------------------------------------------------------
) 结束“OptionalKey3”
--------------------------------------------------------------------------------
)? 结束分组
--------------------------------------------------------------------------------
\z 字符串的结尾
英文:
Use
(?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
Key1: 'Key1:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key1> group and capture to "Key1":
--------------------------------------------------------------------------------
.* any character (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of "Key1"
--------------------------------------------------------------------------------
Key2: 'Key2:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key2> group and capture to "Key2":
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of "Key2"
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
OptionalKey3: 'OptionalKey3:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<OptionalKey3> group and capture to "OptionalKey3":
--------------------------------------------------------------------------------
.* any character (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of "OptionalKey3"
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\z the end of the string
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论