2023年4月20日 08:15:14go评论60阅读模式

英文:

How to use regex to match paragraphs containing a specific word?

问题

I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be

((?!\n\n).)*(cat)

, but this don't work.

How can I use PCRE2 regular expressions (PHP >= 7.3) to match all paragraphs in my text that contain the word "cat", where each paragraph is separated by two consecutive line breaks (It is allowed to have one line break in a paragraph but not two)?

For example, if the input text is as follow

Paragraph 1 wepfowfpo
fww efwf

Paragraph 2 wefwf32321
!@d r33tcat54, 333!..

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

Paragraph 4 222cocdo3

Then the desired ouput is

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

I tried to use something like \n\n.*(?=cat)cat.*\n\n, but this match only those lines contain "cat".

英文:

I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be

((?!\n\n).)*(cat)

, but this don't work.

For example, if the input text is as follow

Paragraph 1 wepfowfpo
fww efwf

Paragraph 2 wefwf32321
!@d r33tcat54, 333!..

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

Paragraph 4 222cocdo3

Then the desired ouput is

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

I tried to use something like \n\n.*(?=cat)cat.*\n\n, but this match only those lines contain "cat".

答案1

得分: 1

Sure, here are the translated parts:

如何将字符串拆分成段落并匹配包含cat的段落。

preg_grep('/\bcat\b/i', explode("\n\n", $str));

在tio.run上查看此PHP演示 - \b表示单词边界，防止匹配到tcat5。

如果不能使用PHP函数，可以使用正则表达式的(?m) 多行模式来实现。

^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*

在regex101上查看此演示 - 另外添加i标志以忽略大小写（也匹配例如Cat）。

正则表达式	解释
`(?m)`	多行模式标志，使`^`能够匹配行的开头
`^(?:.+\n)*`	从`^`开始，重复匹配`(?:`非捕获组 `)` `` 任意次，包含:<br>`.+` 贪婪地* 匹配一个或多个字符直到`\n`换行符 - 匹配段落前的部分<br>（如果可用，使用原子组而不是非捕获在这里可以更有效率：演示）
`.?\bcat\b.`	`.?`懒惰地匹配任何字符直到`\bcat\b`（使用单词边界） `.`匹配行的其余部分
`(?:\n.+)*`	匹配段落中的其余行，其中`.+`防止跳过`\n\n`

希望这有所帮助！

英文:

How about splitting the string into paragraphs and matching those containing cat.

preg_grep(&#39;/\bcat\b/i&#39;, explode(&quot;\n\n&quot;, $str));

See this PHP demo at tio.run - The word bundary \b prevents from matching tcat5.

If you can't use PHP functions, following a regex-only idea for (?m) multiline mode.

^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*

See this demo at regex101 - Further add i flag to ignore case (also match e.g. Cat).

regex	explained
`(?m)`	flag for multiline mode to make `^` match line start too
`^(?:.+\n)*`	at `^` start repeat the `(?:` non capturing group `)` `` any amount* of times, containing:<br>`.+` greedily match one or more chars up to `\n` newline - part that matches lines before<br>(if available, use of atomic group instead non capture can be more efficient here: demo)
`.?\bcat\b.`	`.?` matches lazily* any characters up to `\bcat\b` (using word bundaries) `.*` rest of line
`(?:\n.+)*`	matches any remaining lines in the paragraph where `.+` prevents to skip over `\n\n`

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正则表达式来匹配包含特定单词的段落如何？

问题

答案1

Java和正则表达式词法分析器

如何解析已使用反斜杠字符串化的对象数组

Golang正则表达式在文件中精确匹配行

替换 JavaScript 正则表达式中的空格和换行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论