2023年2月9日 01:06:16go评论49阅读模式

英文:

Looking for regex to find footer elements

问题

以下是翻译好的部分：

我想使用正则表达式来搜索epub中的所有页脚示例，如下所示：

&lt;p class=&quot;calibre1&quot;&gt;2  &amp;lt;&amp;gt;  GENERAL INTRODUCTION &lt;/p&gt;

更一般的格式如下：

&lt;p class=&quot;calibre1&quot;&gt;[1-1000中的页码][&quot;  &amp;lt;&amp;gt;&quot;][章节标题]&lt;/p&gt;

我的目标是使用calibre的正则表达式来查找所有这种页脚示例并删除它们，但我尝试了以下表达式，没有一个可以找到上面的示例：

&lt;p class=&quot;calibre1&quot;&gt;[0-9]  &amp;lt;&amp;gt;[^&gt;] &lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9]  &amp;lt;&amp;gt;  [\w] &lt;/p&gt;
甚至一般的：
&lt;p class=&quot;calibre1&quot;&gt;[\w--[\d_]]&lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9] [.]&lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9] *[.]&lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9][*.]&lt;/p&gt;

我对正则表达式很陌生，正在绞尽脑汁。请帮助我理解。

英文:

I would like to use regex to search for all instances of a footer in a epub like the following sample:

&lt;p class=&quot;calibre1&quot;&gt;2  &amp;lt;&amp;gt;  GENERAL INTRODUCTION &lt;/p&gt;

of the more general format:

&lt;p class=&quot;calibre1&quot;&gt;[page number from 1-1000][&quot;  &amp;lt;&amp;gt;&quot;][Title of section]&lt;/p&gt;

My goal is to use calibre's regex to find all instances of that footer and delete them but I've tried these expressions and none of them work to even find the one above example:

&lt;p class=&quot;calibre1&quot;&gt;[0-9]  &amp;lt;&amp;gt;[^&gt;] &lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9]  &amp;lt;&amp;gt;  [\w] &lt;/p&gt;
and even the general:
&lt;p class=&quot;calibre1&quot;&gt;[\w--[\d_]]&lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9] [.]&lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9] *[.]&lt;/p&gt;
&lt;p class=&quot;calibre1&quot;&gt;[0-9][*.]&lt;/p&gt;

I'm new to regex and am pulling my hair out. Please help with my (mis)understanding.

答案1

得分: 0

这应该适用于您想要的内容:

^&lt;p[ \t]*class=&quot;calibre1&quot;&gt;[0-9]+[^&lt;]*&amp;lt;&amp;gt;[^&lt;]*&lt;[/]p&gt;$

英文:

This should work for what you want:

^&lt;p[ \t]*class=&quot;calibre1&quot;&gt;[0-9]+[^&lt;]*&amp;lt;&amp;gt;[^&lt;]*&lt;[/]p&gt;$

答案2

得分: 0

请尝试以下代码：

^&lt;p class=&quot;calibre1&quot;&gt;\d{1,4}.*&lt;/p&gt;$

解释：

^ - 锚定到行的开头
<p class="calibre1"> - 要匹配的实际文本
\d{1,4} - 匹配1到4位数字
.* - 然后匹配零个或多个字符
<\p> - 直到闭合标签
$ - 锚定到行的末尾

英文:

Please try this:

^&lt;p class=&quot;calibre1&quot;&gt;\d{1,4}.*&lt;/p&gt;$

^ - Anchor to the start of the line
&lt;p class=&quot;calibre1&quot;&gt; - Actual text to match
\d{1,4} - match 1 to 4 digits
.* - then zero or more characters 
&lt;\p&gt; - until the closing tag
$ - anchored to the end of the line

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

寻找正则表达式以查找页脚元素

问题

答案1

答案2

建议处理混合了数值和分类特征的自然语言处理（NLP）数据的最佳方法：

如何在Python中从字符串中移除子集

如何在Oracle中分隔多个字符串？

如何在具有条件的转录中计算特定关键词的数量

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论