匹配特定字符串之间的所有文本,以非贪婪方式。

huangapple go评论62阅读模式
英文:

Match all text ocurring in between specific strings in a nongreedy way

问题

以下是您要翻译的内容:

我有一段非常长的文本,其中包含一些稀疏隐藏的消息。每个隐藏的消息都放置在关键词之间(让我们称之为string1和string2)。我正在尝试编写一个正则表达式(在Python中),它将一次性给我所有隐藏的消息。让我举个例子,以便清楚:

假设我们有以下文本:

Hello @9>ruur>9@ dafa  @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>   @9>  @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9> >9@>9@>9@>9@9>9@9>@

关键词是string1 = "@9>"string2 = ">9@"。然后正则表达式应该匹配以下内容(加粗字体或加粗下划线):

Hello @9>ruur>9@ dafa  @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>   @9>  @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>@

请注意非贪婪行为,即匹配不应在任何情况下包含string1或string2,只包含它们之间的任何内容。如果我只想匹配长度为n或更长的消息,该如何修改正则表达式?

这是我迄今为止的最佳尝试:

这个正则表达式几乎可以完成任务:r"@9>((?!@9>).*?)>9@",但不完全如下所示:

Hello @9>ruur>9@ dafa  @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>___@9>__@9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>

不容易绘制空格,所以请查看以下图片以获得进一步的说明:

匹配特定字符串之间的所有文本,以非贪婪方式。

提前谢谢您!

英文:

I have a very long text that contains some sparsely hidden messages. Every hidden message is placed in beetween keywords (lets call them string1 and string2). I am trying to come up with a regex (in Python) that will give me all the hidden messages at once. Let me give an example to be clear:

Say we have the text:

Hello @9>ruur>9@ dafa  @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>   @9>  @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9> >9@>9@>9@>9@9>9@9>@

And the keywords are string1 = "@9>" and string2 = ">9@". Then the regex should match the following (boldcase font or bold underscore):

<pre>
Hello @9><b>ruur</b>>9@ dafa @9><b>ruerur</b>>9@ @>ru@9><b>ruour</b>>9@ur>@, @9> @9> @9> @9>@9><b>opp</b>>9@@9>@9><b>iiiii</b>>9@@9>@9>@9>@9><b>_</b>>9@>9@>9@>9@9>9@9>@"
</pre>

Please, note the nongreedy behaviour i.e. the match should not in any case contain either string1 or string2 just whatever happens to be in beetween. How would the regex be modified if I only wish to match messages that are of length n or more?

I show my best attempt so far:

This regex almost gets the job done: r&quot;@9&gt;((?!@9&gt;).*?)&gt;9@&quot; but not quite as the mathes are:

<pre>
Hello @9><b>ruur</b>>9@ dafa @9><b>ruerur</b>>9@ @>ru@9><b>ruour</b>>9@ur>@, @9><b>@9></b><b>@9> @9>@9>opp</b>>9@@9>@9><b>iiiii</b>>9@@9>@9>@9>@9><b></b>>9@>9@>9@>9@9>9@9>@
</pre>

Is not so easy to draw whitespace so please take a look at the following picture for further clarification:

匹配特定字符串之间的所有文本,以非贪婪方式。.

Thank you in advance!

答案1

得分: 1

你可以否定在环视之前找到的内容,例如其他重复的 "@9>"。这很可能会精确地收集到您的匹配项。

正则表达式解释:

  • (?<=(?!@9&gt; ?@9&gt;)@9&gt;):正向后瞻是...
    • (?!@9&gt; ?@9&gt;):不匹配
      • @9&gt;:@9>
      • ?:可选空格
      • @9&gt;:@9>
    • @9&gt;:但精确匹配 "@9>"
  • .+?:懒惰匹配您需要的信息
  • (?=>9@):正向前瞻是 ">9@"

请查看此处的演示链接1

英文:

You can negate the existence of what is found before the lookaround, such as other repeated "@9>". This is likely to gather exactly your matches.

(?&lt;=(?!@9&gt; ?@9&gt;)@9&gt;).+?(?=&gt;9@)

Regex Explanation:

  • (?&lt;=(?!@9&gt; ?@9&gt;)@9&gt;): positive lookbehind of...
    • (?!@9&gt; ?@9&gt;): not matching
      • @9&gt;: @9>
      • ?: optional space
      • @9&gt;: @9>
    • @9&gt;: but matching exactly "@9>"
  • .+?: lazy matching of your needed infos
  • (?=&gt;9@): positive lookahead of ">9@"

Check the demo here.

huangapple
  • 本文由 发表于 2023年5月30日 01:10:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76359183.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定