英文:
Match all text ocurring in between specific strings in a nongreedy way
问题
以下是您要翻译的内容:
我有一段非常长的文本,其中包含一些稀疏隐藏的消息。每个隐藏的消息都放置在关键词之间(让我们称之为string1和string2)。我正在尝试编写一个正则表达式(在Python中),它将一次性给我所有隐藏的消息。让我举个例子,以便清楚:
假设我们有以下文本:
Hello @9>ruur>9@ dafa @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9> @9> @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9> >9@>9@>9@>9@9>9@9>@
关键词是string1 = "@9>"
和string2 = ">9@"
。然后正则表达式应该匹配以下内容(加粗字体或加粗下划线):
Hello @9>ruur>9@ dafa @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9> @9> @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>@
请注意非贪婪行为,即匹配不应在任何情况下包含string1或string2,只包含它们之间的任何内容。如果我只想匹配长度为n或更长的消息,该如何修改正则表达式?
这是我迄今为止的最佳尝试:
这个正则表达式几乎可以完成任务:r"@9>((?!@9>).*?)>9@"
,但不完全如下所示:
Hello @9>ruur>9@ dafa @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>___@9>__@9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>
不容易绘制空格,所以请查看以下图片以获得进一步的说明:
提前谢谢您!
英文:
I have a very long text that contains some sparsely hidden messages. Every hidden message is placed in beetween keywords (lets call them string1 and string2). I am trying to come up with a regex (in Python) that will give me all the hidden messages at once. Let me give an example to be clear:
Say we have the text:
Hello @9>ruur>9@ dafa @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9> @9> @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9> >9@>9@>9@>9@9>9@9>@
And the keywords are string1 = "@9>"
and string2 = ">9@"
. Then the regex should match the following (boldcase font or bold underscore):
<pre>
Hello @9><b>ruur</b>>9@ dafa @9><b>ruerur</b>>9@ @>ru@9><b>ruour</b>>9@ur>@, @9> @9> @9> @9>@9><b>opp</b>>9@@9>@9><b>iiiii</b>>9@@9>@9>@9>@9><b>_</b>>9@>9@>9@>9@9>9@9>@"
</pre>
Please, note the nongreedy behaviour i.e. the match should not in any case contain either string1 or string2 just whatever happens to be in beetween. How would the regex be modified if I only wish to match messages that are of length n or more?
I show my best attempt so far:
This regex almost gets the job done: r"@9>((?!@9>).*?)>9@"
but not quite as the mathes are:
<pre>
Hello @9><b>ruur</b>>9@ dafa @9><b>ruerur</b>>9@ @>ru@9><b>ruour</b>>9@ur>@, @9><b>@9></b><b>@9> @9>@9>opp</b>>9@@9>@9><b>iiiii</b>>9@@9>@9>@9>@9><b></b>>9@>9@>9@>9@9>9@9>@
</pre>
Is not so easy to draw whitespace so please take a look at the following picture for further clarification:
Thank you in advance!
答案1
得分: 1
你可以否定在环视之前找到的内容,例如其他重复的 "@9>"。这很可能会精确地收集到您的匹配项。
正则表达式解释:
(?<=(?!@9> ?@9>)@9>)
:正向后瞻是...(?!@9> ?@9>)
:不匹配@9>
:@9>?
:可选空格@9>
:@9>
@9>
:但精确匹配 "@9>"
.+?
:懒惰匹配您需要的信息(?=>9@)
:正向前瞻是 ">9@"
请查看此处的演示链接1。
英文:
You can negate the existence of what is found before the lookaround, such as other repeated "@9>". This is likely to gather exactly your matches.
(?<=(?!@9> ?@9>)@9>).+?(?=>9@)
Regex Explanation:
(?<=(?!@9> ?@9>)@9>)
: positive lookbehind of...(?!@9> ?@9>)
: not matching@9>
: @9>?
: optional space@9>
: @9>
@9>
: but matching exactly "@9>"
.+?
: lazy matching of your needed infos(?=>9@)
: positive lookahead of ">9@"
Check the demo here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论