2023年5月7日 04:13:41go评论52阅读模式

英文:

How to create a regex that will substitute whatever that is following an email signature with an empty string?

问题

我理解了。以下是你想要的翻译：

我正在尝试创建一个正则表达式，它将用空字符串替换跟在电子邮件签名后面（以人名开头）的任何内容。

我使用个人身份信息（PII）数据作为输入，所以一般的示例可以是：

1) `xxx 问候，[姓名] xxx`
2) `xxx 问候。[姓名] xxx`
3) `xxx 致以最美好的祝愿，[姓名] xxx`
4) `xxx
   问候，
   致以最美好的祝愿，
   [姓名]`
5) `xxx
   致以最美好的祝愿，
   [姓名]`
4) `xxx
   问候，
   致以最美好的祝愿
   [姓名]`

我希望移除`[姓名]`模式以及其后的所有内容，但当它之前有一个或多个签名模式（例如：问候，或问候。或问候）时，才应该移除它。

当然，`最美好的祝愿`和`问候`不是唯一的结束示例，每个示例都可以有几个标点后缀。

我该如何做到这一点？

谢谢！

以下是我目前的代码：

```python
text = 'bla bla 真诚地，最美好的祝愿，[姓名] bli bli.'
text = re.sub(r'((?:感谢(?:再次)?(?:\.|,|!)|提前致谢(?:\.|,|!)|多谢(?:\.|,|!)|(?:最美好的祝愿,)|最美好的问候(?:\.|,|!)?|致以最美好的问候|一切顺利(?:\.|,|!)?|(?:真诚地(?:,|\.))|干杯(?:\.|,|!)|祝您愉快(?:\.|,|!)?)\s*)+(\[姓名\].*)', r'', text, flags=re.DOTALL|re.UNICODE|re.IGNORECASE)

错误的输出目前是：

bla bla[姓名] bli bli

期望的输出是：

bla bla 真诚地，最美好的祝愿，

英文:

I'm trying to create a regex that will substitute whatever that is following an email signature (begining with the person's name) with an empty string.

I use a PII data as my input, so general examples can be:

xxx best, [NAME] xxx
xxx best. [NAME] xxx
xxx best, regards, [NAME] xxx
xxx best, regards, [NAME]
xxx regards, [NAME]
xxx best, regards [NAME]

I wish my to remove the [NAME] pattern and all that is after it, but of course it should only remove it if before it there's one (or more) of the signature patterns (for example: best, or best. or best or regards).

Of course that best and regards are not the only closing examples, and eac of these can have a few of punctuation postfixes.

How can I do that?

Thanks!

Here is what I have so far:

text = &#39;bla bla sincerely, best, [NAME] bli bli.&#39;
text = re.sub(r&#39;((?:thanks(?: again)?(?:\.|,|!)|thanks in advance(?:\.|,|!)|many thanks(?:\.|,|!)|(?:best,)|best regards(?:\.|,|!)?|with best regards|All the best(?:\.|,|!)?|(?:sincerely(?:,|\.))|cheers(?:\.|,|!)|have a nice day(?:\.|,|!)?)\s*)+(\[NAME\].*)&#39;, r&#39;&#39;, text, flags=re.DOTALL|re.UNICODE|re.IGNORECASE)

The wrong output is currently:

bla bla[NAME] bli bli

The desired output is:

bla bla sincerely, best,

答案1

得分: 1

以下是翻译好的代码部分：

首先，将短语提取到专用列表中；这可以确保可读性和可维护性：

phrases = [
  'thanks',
  'thanks again',
  'thanks in advance',
  'many thanks',
  'best',
  'best regards',
  'with best regards',
  'all the best',
  'sincerely',
  'cheers',
  'have a nice day'
]

...然后从中构建正则表达式：

import re

escaped_phrases = '|'.join(re.escape(phrase) for phrase in phrases)
regex = re.compile(
    fr'((?:(?:{escaped_phrases})[.,!]?\s*)+)\[NAME].*',
    flags=re.DOTALL | re.UNICODE | re.IGNORECASE
)

解释：

(                # 匹配一个捕获组，包括
  (?:            # 非捕获组
    (?:phrases)  # 包含短语之一，
    [.,!]?       # 后面可以跟着'.'、','或'!'，可选，
    \s*          # 然后0+个空白字符，
  )+             # 1个或多个
)                # 然后
\[NAME].*        # '[NAME]' 文本和之后的任何内容。

由于我们要匹配短语和名字，所以我们需要使用''返回前者：

def remove_name(text):
  return regex.sub(r'', text)

尝试一下：

text = 'bla bla sincerely, best, [NAME] bli bli.'

print(regex)

'''
re.compile(
  '((?:(?:thanks|thanks again|...)[.,!]?\\s*)+)\\[NAME].*',
  re.IGNORECASE | re.UNICODE | re.DOTALL
)
'''

text = regex.sub(r'', text)
print(text)  # 'bla bla sincerely, best, '

请注意，以上是代码的翻译，不包括注释或解释性文本。

英文:

First, extract the phrases to a dedicated list; this ensures readability and maintainability:

phrases = [
  &#39;thanks&#39;,
  &#39;thanks again&#39;,
  &#39;thanks in advance&#39;,
  &#39;many thanks&#39;,
  &#39;best&#39;,
  &#39;best regards&#39;,
  &#39;with best regards&#39;,
  &#39;all the best&#39;,
  &#39;sincerely&#39;,
  &#39;cheers&#39;,
  &#39;have a nice day&#39;
]

...then construct the regex from that:

import re

escaped_phrases = &#39;|&#39;.join(re.escape(phrase) for phrase in phrases)
regex = re.compile(
	fr&#39;((?:(?:{escaped_phrases})[.,!]?\s*)+)\[NAME].*&#39;,
	flags = re.DOTALL | re.UNICODE | re.IGNORECASE
)

Explanation:

(                # Match a capturing group consisting of
  (?:            #    non-capturing groups
    (?:phrases)  #    that has one of the phrases,
    [.,!]?       #    followed by &#39;.&#39;, &#39;,&#39; or &#39;!&#39;, optionally,
    \s*          #    then 0+ whitespace characters,
  )+             # 1+
)                # then
\[NAME].*        # &#39;[NAME]&#39; literally and anything after that.

Since we're matching both the phrases and the name, we need to give the former back with \1:

def remove_name(text):
  return regex.sub(r&#39;&#39;, text)

Try it:

text = &#39;bla bla sincerely, best, [NAME] bli bli.&#39;

print(regex)

&#39;&#39;&#39;
re.compile(
  &#39;((?:(?:thanks|thanks again|...)[.,!]?\\s*)+)\\[NAME].*&#39;,
  re.IGNORECASE | re.UNICODE | re.DOTALL
)
&#39;&#39;&#39;

text = regex.sub(r&#39;&#39;, text)
print(text)  # &#39;bla bla sincerely, best, &#39;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to create a regex that will substitute whatever that is following an email signature with an empty string?

问题

答案1

ValueError: not enough values to unpack (expected 5, got 4) when using nes_py and gym_super_mario_bros

aiohttp：应该重复使用ClientSession还是为每个请求新建一个？

如何使用Tkinter创建基于面向对象编程的菜单栏

使用正则表达式在文本中查找带有撇号的单词。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论