英文:
How to match text in quotation marks between p tags (regex) - Calibre Search and Replace
问题
I understand your request. Here's the translated text you asked for:
我需要对下面的文本进行一些格式化,为此我需要匹配仅在p标签(<p>和</p>)内部的引号之间的文本。
下面的文本是一个示例:
<div class="vung_doc" id="vung_doc">
<p>Volume 1: The Mysterious Driver
</p>
<p>He picked up the pistol from the pool of blood and pointed it at the
person coming towards him, screaming, "I'll kill you!"
</p>
<p>No matter how many times he pressed the trigger, the rounds didn't budge.
The approaching figure mockingly spoke, "Haha, what a scene! The Great Detective Song Lang,
actually killing his superior and partner with his very own hands! I can't wait to see the
headlines in the newspapers tomorrow!"
</p>
我只需要匹配 "I'll kill you!"
和 "Haha, what a scene! The Great Detective Song Lang, actually killing his superior and partner with his very own hands! I can't wait to see the headlines in the newspapers tomorrow!"
。
但是大多数正则表达式都会获取引号之间的所有文本 *"(.*?)"*
,获取p标签之间的所有文本 *\<p\>(.*?)\<\\/p\>*
或者介于两者之间的文本。
我使用Calibre的搜索和替换功能,所以只能使用一行正则表达式。我使用ReExr来测试这些表达式。
英文:
I need to do some formatting on the text below and to do so I need to match only the text between quotes inside p tags (<p> and </p>).
This text below is an example:
<div class="vung_doc" id="vung_doc">
<p>Volume 1: The Mysterious Driver
</p>
<p>He picked up the pistol from the pool of blood and pointed it at the
person coming towards him, screaming, "I'll kill you!"
</p>
<p>No matter how many times he pressed the trigger, the rounds didn't budge.
The approaching figure mockingly spoke, "Haha, what a scene! The Great Detective Song Lang,
actually killing his superior and partner with his very own hands! I can't wait to see the
headlines in the newspapers tomorrow!"
</p>
I need only to match "I'll kill you!"
and "Haha, what a scene! The Great Detective Song Lang, actually killing his superior and partner with his very own hands! I can't wait to see the headlines in the newspapers tomorrow!"
But most of the regex I tried got all the text between quotes *"(.\*?)"*
, all the text between the p tags *\<p\>(.|\\n)\*?\<\\/p\>*
or something in between.
I use Calibre search and replace, so only one line of regex. I use ReExr to test the expressions.
答案1
得分: 1
不要使用 regex
解析 HTML
不能使用设计用于处理原始文本行的工具来解析任何结构化文本,如XML/HTML。如果需要处理XML/HTML,请使用XML/HTML解析器。大多数编程语言都内置支持解析XML,还有像 xidel
、xmlstarlet
或 xmllint
这样的专门工具,如果需要从命令行快速执行。
使用 xidel
:
xidel -e ''//p/extract(text(),"&quot;(.+)&quot;",1,"s")[.]' 文件
由 Reino 提供。
使用 xidel
和 grep
:
xidel -e ''//p' 文件 | grep -oP ''"\K[^"]+' 文件
输出
我会杀了你!
哈哈,这场景太有趣了!伟大的侦探宋浪,居然用自己的双手杀了上司和搭档!我迫不及待想看明天报纸上的头条新闻!
在这里,我仅对 文本部分 使用 grep
regex
。
英文:
Don't use regex
to parse HTML
you cannot, must not parse any structured text like XML/HTML with tools designed to process raw text lines. If you need to process XML/HTML, use an XML/HTML parser. A great majority of languages have built-in support for parsing XML and there are dedicated tools like xidel
, xmlstarlet
or xmllint
if you need a quick shot from a command line shell.
With xidel
:
xidel -e '//p/extract(text(),"&quot;(.+)&quot;",1,"s")[.]' file
Credits to Reino.
With xidel
and grep
:
xidel -e '//p' file | grep -oP '"\K[^"]+' file
Output
I'll kill you!
Haha, what a scene! The Great Detective Song Lang, actually killing his superior and partner with his very own hands! I can't wait to see the headlines in the newspapers tomorrow!
Here, I use grep
regex
only on the text part.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论