匹配除标题外的所有内容。

huangapple go评论64阅读模式
英文:

python regex, How to match everything except headings?

问题

你可以尝试以下正则表达式来匹配你所需的文本部分:

> (?! \d+\. | [a-z]\. )[^\n]+

这个正则表达式会匹配以 > 开头,但不以数字加点或小写字母加点开头的文本行。

英文:

I have this piece of text:

> 2. Shifting Your Mindset: Transforming your mindset from a negative or limited mindset to a positive one requires conscious effort and
> practice. Here are some key strategies to help you make that shift
>
> a. Self-Awareness: Start by becoming aware of your thoughts and inner
> dialogue. Notice any negative self-talk or limiting beliefs that may
> be holding you back financially.
>
> b. Reframing: Challenge and reframe negative thoughts or situations
> into positive ones. Instead of dwelling on financial setbacks, focus
> on the lessons learned and the potential opportunities they may
> present.

I want to match everything except the headings started with a number or a letter and followed by a dot(.), so the output should be:

> Transforming your mindset from a negative or limited mindset to a positive one requires conscious effort and
> practice. Here are some key strategies to help you make that shift:
>
> Start by becoming aware of your thoughts and inner
> dialogue. Notice any negative self-talk or limiting beliefs that may
> be holding you back financially.
>
> Challenge and reframe negative thoughts or situations
> into positive ones. Instead of dwelling on financial setbacks, focus
> on the lessons learned and the potential opportunities they may
> present.

So I tried this pattern: (?!\s*[a-z]\.\s.*:)(?!\d+\.\s*.*:).*
but I couldn't match them.

答案1

得分: 1

如果您使用 re.sub,您可以专注于输出中 不希望 的部分。

我将使用以下特征来定义一个 "标题":

  • 以一行的开头开始(可能在一些空白字符之后)
  • 其第一个字符是字母数字字符
  • 以冒号结束,距离第一个(非空格)字符不超过30个字符,可能后面跟有一些空白字符。
import re

s = """Shifting Your Mindset: Transforming your mindset from a negative or limited mindset to a positive one requires conscious effort and practice. Here are some key strategies to help you make that shift:
a. Self-Awareness: Start by becoming aware of your thoughts and inner dialogue. Notice any negative self-talk or limiting beliefs that may be holding you back financially.

b. Reframing: Challenge and reframe negative thoughts or situations into positive ones. Instead of dwelling on financial setbacks, focus on the lessons learned and the potential opportunities they may present."""
s = re.sub(r"^ *\w[^\r\n:]{0,30}:\s*", "", s, flags=re.M)
print(s)

这会输出:

Transforming your mindset from a negative or limited mindset to a positive one requires conscious effort and practice. Here are some key strategies to help you make that shift:
Start by becoming aware of your thoughts and inner dialogue. Notice any negative self-talk or limiting beliefs that may be holding you back financially.

Challenge and reframe negative thoughts or situations into positive ones. Instead of dwelling on financial setbacks, focus on the lessons learned and the potential opportunities they may present.

请注意:以上内容已经被翻译。

英文:

If you use re.sub you can focus on the parts that you don't want in the output.

I'll use these characteristics of what constitutes a "header":

  • Starts with at the start of a line (possibly after some white space)
  • Its first character is alphanumeric
  • It ends with a colon, no further than 30 characters from the first (non space) character, potentially followed by some white space.
import re

s = """Shifting Your Mindset: Transforming your mindset from a negative or limited mindset to a positive one requires conscious effort and practice. Here are some key strategies to help you make that shift:
a. Self-Awareness: Start by becoming aware of your thoughts and inner dialogue. Notice any negative self-talk or limiting beliefs that may be holding you back financially.

b. Reframing: Challenge and reframe negative thoughts or situations into positive ones. Instead of dwelling on financial setbacks, focus on the lessons learned and the potential opportunities they may present."""

s = re.sub(r"^ *\w[^\r\n:]{0,30}:\s*", "", s, flags=re.M)
print(s)

This outputs:

Transforming your mindset from a negative or limited mindset to a positive one requires conscious effort and practice. Here are some key strategies to help you make that shift:
Start by becoming aware of your thoughts and inner dialogue. Notice any negative self-talk or limiting beliefs that may be holding you back financially.

Challenge and reframe negative thoughts or situations into positive ones. Instead of dwelling on financial setbacks, focus on the lessons learned and the potential opportunities they may present.

答案2

得分: 0

谢谢大家的反馈,然而,我已经阅读了关于正则表达式的文档,最终找到了完美的答案,即以下模式:

(?<=:)\s*.*

这个正则表达式使用了后向断言,将完美地排除了冒号后面的所有字符串,后面可以跟着0个或多个空格,这意味着它将匹配所有的内容段落。

英文:

Thank you all for your feedback, however, I read the re documentation and finally I got the perfect answer which is this pattern:

(?&lt;=:)\s*.*

this regex uses the lookbehind assertion will perfectly exclude every string that comes after a colon followed by 0 or more white spaces, which means, it will match all content paragraphs.

huangapple
  • 本文由 发表于 2023年5月22日 02:24:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76301337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定