2023年5月7日 05:44:56go评论73阅读模式

英文:

How to match text in quotation marks between p tags (regex) - Calibre Search and Replace

问题

I understand your request. Here's the translated text you asked for:

我需要对下面的文本进行一些格式化，为此我需要匹配仅在p标签（<p>和</p>）内部的引号之间的文本。

下面的文本是一个示例：

&lt;div class=&quot;vung_doc&quot; id=&quot;vung_doc&quot;&gt;
&lt;p&gt;Volume 1: The Mysterious Driver
&lt;/p&gt;
&lt;p&gt;He picked up the pistol from the pool of blood and pointed it at the
person coming towards him, screaming, &quot;I&#39;ll kill you!&quot;
&lt;/p&gt;
&lt;p&gt;No matter how many times he pressed the trigger, the rounds didn&#39;t budge.
The approaching figure mockingly spoke, &quot;Haha, what a scene! The Great Detective Song Lang, 
actually killing his superior and partner with his very own hands! I can&#39;t wait to see the 
headlines in the newspapers tomorrow!&quot;
&lt;/p&gt;

我只需要匹配 "I'll kill you!" 和 "Haha, what a scene! The Great Detective Song Lang, actually killing his superior and partner with his very own hands! I can't wait to see the headlines in the newspapers tomorrow!"。

但是大多数正则表达式都会获取引号之间的所有文本 *"(.*?)"*，获取p标签之间的所有文本 *\<p\>(.*?)\<\\/p\>* 或者介于两者之间的文本。

我使用Calibre的搜索和替换功能，所以只能使用一行正则表达式。我使用ReExr来测试这些表达式。

英文:

I need to do some formatting on the text below and to do so I need to match only the text between quotes inside p tags (<p> and </p>).

This text below is an example:

&lt;div class=&quot;vung_doc&quot; id=&quot;vung_doc&quot;&gt;
&lt;p&gt;Volume 1: The Mysterious Driver
&lt;/p&gt;
&lt;p&gt;He picked up the pistol from the pool of blood and pointed it at the
person coming towards him, screaming, &quot;I&#39;ll kill you!&quot;
&lt;/p&gt;
&lt;p&gt;No matter how many times he pressed the trigger, the rounds didn&#39;t budge.
The approaching figure mockingly spoke, &quot;Haha, what a scene! The Great Detective Song Lang, 
actually killing his superior and partner with his very own hands! I can&#39;t wait to see the 
headlines in the newspapers tomorrow!&quot;
&lt;/p&gt;

I need only to match "I'll kill you!" and "Haha, what a scene! The Great Detective Song Lang, actually killing his superior and partner with his very own hands! I can't wait to see the headlines in the newspapers tomorrow!"

But most of the regex I tried got all the text between quotes *"(.\*?)"*, all the text between the p tags *\<p\>(.|\\n)\*?\<\\/p\>* or something in between.

I use Calibre search and replace, so only one line of regex. I use ReExr to test the expressions.

答案1

得分: 1

不要使用 `regex` 解析 `HTML`

不能使用设计用于处理原始文本行的工具来解析任何结构化文本，如XML/HTML。如果需要处理XML/HTML，请使用XML/HTML解析器。大多数编程语言都内置支持解析XML，还有像 xidel、xmlstarlet 或 xmllint 这样的专门工具，如果需要从命令行快速执行。

使用 `xidel`：

xidel -e '&#39;//p/extract(text(),&quot;&amp;quot;(.+)&amp;quot;&quot;,1,&quot;s&quot;)[.]&#39; 文件

由 Reino 提供。

使用 `xidel` 和 `grep`：

xidel -e '&#39;//p&#39; 文件 | grep -oP '&#39;&quot;\K[^&quot;]+&#39; 文件

输出

我会杀了你！
哈哈，这场景太有趣了！伟大的侦探宋浪，居然用自己的双手杀了上司和搭档！我迫不及待想看明天报纸上的头条新闻！

在这里，我仅对 文本部分 使用 grep regex。

英文:

Don't use `regex` to parse `HTML`

you cannot, must not parse any structured text like XML/HTML with tools designed to process raw text lines. If you need to process XML/HTML, use an XML/HTML parser. A great majority of languages have built-in support for parsing XML and there are dedicated tools like xidel, xmlstarlet or xmllint if you need a quick shot from a command line shell.

With `xidel`:

xidel -e &#39;//p/extract(text(),&quot;&amp;quot;(.+)&amp;quot;&quot;,1,&quot;s&quot;)[.]&#39; file

Credits to Reino.

With `xidel` and `grep`:

xidel -e &#39;//p&#39; file | grep -oP &#39;&quot;\K[^&quot;]+&#39; file

Output

I&#39;ll kill you!
Haha, what a scene! The Great Detective Song Lang, actually killing his superior and partner with his very own hands! I can&#39;t wait to see the headlines in the newspapers tomorrow!

Here, I use grep regex only on the text part.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

匹配 p 标签之间带引号的文本（正则表达式） – Calibre 搜索与替换

问题

答案1

不要使用 `regex` 解析 `HTML`

使用 `xidel`：

使用 `xidel` 和 `grep`：

输出

Don't use `regex` to parse `HTML`

With `xidel`:

With `xidel` and `grep`:

Output

使用Golang和正则表达式去掉外部标签？

可以使用正则表达式处理异位词。

Pandas数据框如何通过比较列A和B的正则表达式输出来删除行？

Python正则表达式的正向先行断言无法正确分割。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论

问题

答案1

使用 xidel：

使用 xidel 和 grep：

输出

With xidel:

With xidel and grep:

Output

发表评论

使用 `xidel`：

使用 `xidel` 和 `grep`：

With `xidel`:

With `xidel` and `grep`: