英文:
How to use regex to match paragraphs containing a specific word?
问题
I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be
((?!\n\n).)*(cat)
, but this don't work.
How can I use PCRE2 regular expressions (PHP >= 7.3) to match all paragraphs in my text that contain the word "cat", where each paragraph is separated by two consecutive line breaks (It is allowed to have one line break in a paragraph but not two)?
For example, if the input text is as follow
Paragraph 1 wepfowfpo
fww efwf
Paragraph 2 wefwf32321
!@d r33tcat54, 333!..
Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203
$$333
Paragraph 4 222cocdo3
Then the desired ouput is
Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203
$$333
I tried to use something like \n\n.*(?=cat)cat.*\n\n
, but this match only those lines contain "cat".
英文:
I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be
((?!\n\n).)*(cat)
, but this don't work.
How can I use PCRE2 regular expressions (PHP >= 7.3) to match all paragraphs in my text that contain the word "cat", where each paragraph is separated by two consecutive line breaks (It is allowed to have one line break in a paragraph but not two)?
For example, if the input text is as follow
Paragraph 1 wepfowfpo
fww efwf
Paragraph 2 wefwf32321
!@d r33tcat54, 333!..
Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203
$$333
Paragraph 4 222cocdo3
Then the desired ouput is
Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203
$$333
I tried to use something like \n\n.*(?=cat)cat.*\n\n
, but this match only those lines contain "cat".
答案1
得分: 1
Sure, here are the translated parts:
如何将字符串拆分成段落并匹配包含cat
的段落。
preg_grep('/\bcat\b/i', explode("\n\n", $str));
在tio.run上查看此PHP演示 - \b
表示单词边界,防止匹配到tcat5
。
如果不能使用PHP函数,可以使用正则表达式的(?m)
多行 模式来实现。
^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*
在regex101上查看此演示 - 另外添加i
标志以忽略大小写(也匹配例如Cat
)。
正则表达式 | 解释 |
---|---|
(?m) |
多行模式 标志,使^ 能够匹配行的开头 |
^(?:.+\n)* |
从^ 开始,重复匹配(?: 非捕获组 ) * 任意次,包含:<br>.+ 贪婪地 匹配一个或多个字符直到\n 换行符 - 匹配段落前的部分<br>(如果可用,使用原子组而不是非捕获在这里可以更有效率:演示) |
.*?\bcat\b.* |
.*? 懒惰地匹配任何字符直到\bcat\b (使用单词边界) .* 匹配行的其余部分 |
(?:\n.+)* |
匹配段落中的其余行,其中.+ 防止跳过\n\n |
希望这有所帮助!
英文:
How about splitting the string into paragraphs and matching those containing cat
.
preg_grep('/\bcat\b/i', explode("\n\n", $str));
See this PHP demo at tio.run - The word bundary \b
prevents from matching tcat5
.
If you can't use PHP functions, following a regex-only idea for (?m)
multiline mode.
^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*
See this demo at regex101 - Further add i
flag to ignore case (also match e.g. Cat
).
regex | explained |
---|---|
(?m) |
flag for multiline mode to make ^ match line start too |
^(?:.+\n)* |
at ^ start repeat the (?: non capturing group ) * any amount of times, containing:<br>.+ greedily match one or more chars up to \n newline - part that matches lines before<br>(if available, use of atomic group instead non capture can be more efficient here: demo) |
.*?\bcat\b.* |
.*? matches lazily any characters up to \bcat\b (using word bundaries) .* rest of line |
(?:\n.+)* |
matches any remaining lines in the paragraph where .+ prevents to skip over \n\n |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论