使用正则表达式来匹配包含特定单词的段落如何?

huangapple go评论54阅读模式
英文:

How to use regex to match paragraphs containing a specific word?

问题

I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be

((?!\n\n).)*(cat)

, but this don't work.


How can I use PCRE2 regular expressions (PHP >= 7.3) to match all paragraphs in my text that contain the word "cat", where each paragraph is separated by two consecutive line breaks (It is allowed to have one line break in a paragraph but not two)?

For example, if the input text is as follow

Paragraph 1 wepfowfpo
fww efwf

Paragraph 2 wefwf32321
!@d r33tcat54, 333!..

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

Paragraph 4 222cocdo3

Then the desired ouput is

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

I tried to use something like \n\n.*(?=cat)cat.*\n\n, but this match only those lines contain "cat".

英文:

I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be

((?!\n\n).)*(cat)

, but this don't work.


How can I use PCRE2 regular expressions (PHP >= 7.3) to match all paragraphs in my text that contain the word "cat", where each paragraph is separated by two consecutive line breaks (It is allowed to have one line break in a paragraph but not two)?

For example, if the input text is as follow

Paragraph 1 wepfowfpo
fww efwf

Paragraph 2 wefwf32321
!@d r33tcat54, 333!..

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

Paragraph 4 222cocdo3

Then the desired ouput is

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

I tried to use something like \n\n.*(?=cat)cat.*\n\n, but this match only those lines contain "cat".

答案1

得分: 1

Sure, here are the translated parts:

如何将字符串拆分成段落并匹配包含cat的段落。

preg_grep('/\bcat\b/i', explode("\n\n", $str));

在tio.run上查看此PHP演示 - \b表示单词边界,防止匹配到tcat5

如果不能使用PHP函数,可以使用正则表达式的(?m) 多行 模式来实现。

^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*

在regex101上查看此演示 - 另外添加i标志以忽略大小写(也匹配例如Cat)。

正则表达式 解释
(?m) 多行模式 标志,使^能够匹配行的开头
^(?:.+\n)* ^开始,重复匹配(?:非捕获组 ) * 任意次,包含:<br>.+ 贪婪地 匹配一个或多个字符直到\n换行符 - 匹配段落前的部分<br>(如果可用,使用原子组而不是非捕获在这里可以更有效率:演示
.*?\bcat\b.* .*?懒惰地匹配任何字符直到\bcat\b(使用单词边界.*匹配行的其余部分
(?:\n.+)* 匹配段落中的其余行,其中.+防止跳过\n\n

希望这有所帮助!

英文:

How about splitting the string into paragraphs and matching those containing cat.

preg_grep(&#39;/\bcat\b/i&#39;, explode(&quot;\n\n&quot;, $str));

See this PHP demo at tio.run - The word bundary \b prevents from matching tcat5.


If you can't use PHP functions, following a regex-only idea for (?m) multiline mode.

^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*

See this demo at regex101 - Further add i flag to ignore case (also match e.g. Cat).

regex explained
(?m) flag for multiline mode to make ^ match line start too
^(?:.+\n)* at ^ start repeat the (?: non capturing group ) * any amount of times, containing:<br>.+ greedily match one or more chars up to \n newline - part that matches lines before<br>(if available, use of atomic group instead non capture can be more efficient here: demo)
.*?\bcat\b.* .*? matches lazily any characters up to \bcat\b (using word bundaries) .* rest of line
(?:\n.+)* matches any remaining lines in the paragraph where .+ prevents to skip over \n\n

huangapple
  • 本文由 发表于 2023年4月20日 08:15:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059678.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定