2023年5月30日 01:10:33go评论97阅读模式

英文:

Match all text ocurring in between specific strings in a nongreedy way

问题

以下是您要翻译的内容：

我有一段非常长的文本，其中包含一些稀疏隐藏的消息。每个隐藏的消息都放置在关键词之间（让我们称之为string1和string2）。我正在尝试编写一个正则表达式（在Python中），它将一次性给我所有隐藏的消息。让我举个例子，以便清楚：

假设我们有以下文本：

Hello @9&gt;ruur&gt;9@ dafa  @9&gt;ruerur&gt;9@ @&gt;ru@9&gt;ruour&gt;9@ur&gt;@, @9&gt;   @9&gt;  @9&gt; @9&gt;@9&gt;opp&gt;9@@9&gt;@9&gt;iiiii&gt;9@@9&gt;@9&gt;@9&gt;@9&gt; &gt;9@&gt;9@&gt;9@&gt;9@9&gt;9@9&gt;@

关键词是string1 = "@9>"和string2 = ">9@"。然后正则表达式应该匹配以下内容（加粗字体或加粗下划线）：

Hello @9>ruur>9@ dafa  @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>   @9>  @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>@

请注意非贪婪行为，即匹配不应在任何情况下包含string1或string2，只包含它们之间的任何内容。如果我只想匹配长度为n或更长的消息，该如何修改正则表达式？

这是我迄今为止的最佳尝试：

这个正则表达式几乎可以完成任务：r"@9>((?!@9>).*?)>9@"，但不完全如下所示：

Hello @9>ruur>9@ dafa  @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>___@9>__@9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>

不容易绘制空格，所以请查看以下图片以获得进一步的说明：

。

提前谢谢您！

英文:

I have a very long text that contains some sparsely hidden messages. Every hidden message is placed in beetween keywords (lets call them string1 and string2). I am trying to come up with a regex (in Python) that will give me all the hidden messages at once. Let me give an example to be clear:

Say we have the text:

Hello @9&gt;ruur&gt;9@ dafa  @9&gt;ruerur&gt;9@ @&gt;ru@9&gt;ruour&gt;9@ur&gt;@, @9&gt;   @9&gt;  @9&gt; @9&gt;@9&gt;opp&gt;9@@9&gt;@9&gt;iiiii&gt;9@@9&gt;@9&gt;@9&gt;@9&gt; &gt;9@&gt;9@&gt;9@&gt;9@9&gt;9@9&gt;@

And the keywords are string1 = "@9>" and string2 = ">9@". Then the regex should match the following (boldcase font or bold underscore):

<pre>
Hello @9>ruur>9@ dafa @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9> @9> @9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>_>9@>9@>9@>9@9>9@9>@"
</pre>

Please, note the nongreedy behaviour i.e. the match should not in any case contain either string1 or string2 just whatever happens to be in beetween. How would the regex be modified if I only wish to match messages that are of length n or more?

I show my best attempt so far:

This regex almost gets the job done: r"@9>((?!@9>).*?)>9@" but not quite as the mathes are:

<pre>
Hello @9>ruur>9@ dafa @9>ruerur>9@ @>ru@9>ruour>9@ur>@, @9>@9>@9> @9>@9>opp>9@@9>@9>iiiii>9@@9>@9>@9>@9>>9@>9@>9@>9@9>9@9>@
</pre>

Is not so easy to draw whitespace so please take a look at the following picture for further clarification:

Thank you in advance!

答案1

得分: 1

你可以否定在环视之前找到的内容，例如其他重复的 "@9>"。这很可能会精确地收集到您的匹配项。

正则表达式解释：

(?<=(?!@9> ?@9>)@9>)：正向后瞻是...
- (?!@9> ?@9>)：不匹配
 - @9>：@9>
 - ?：可选空格
 - @9>：@9>
- @9>：但精确匹配 "@9>"
.+?：懒惰匹配您需要的信息
(?=>9@)：正向前瞻是 ">9@"

请查看此处的演示链接1。

英文:

You can negate the existence of what is found before the lookaround, such as other repeated "@9>". This is likely to gather exactly your matches.

(?&lt;=(?!@9&gt; ?@9&gt;)@9&gt;).+?(?=&gt;9@)

Regex Explanation:

(?<=(?!@9> ?@9>)@9>): positive lookbehind of...
- (?!@9> ?@9>): not matching
 - @9>: @9>
 - ?: optional space
 - @9>: @9>
- @9>: but matching exactly "@9>"
.+?: lazy matching of your needed infos
(?=>9@): positive lookahead of ">9@"

Check the demo here.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

匹配特定字符串之间的所有文本，以非贪婪方式。

问题

答案1

网站抓取对医生的工作没有正常运行。

如何在matplotlib中为没有内部线的蒙版添加轮廓线。

OpenCV imread() 尽管设置了颜色标志，为什么会返回灰度图像？

使用`.str.split(expand=True)`为什么会丢失信息？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。