Python正则表达式中不同字母的个别负向预查。

huangapple go评论64阅读模式
英文:

Python regex individual negative lookaheads for different letters

问题

我正在编写一个程序,以查找可以在计算器上输入的单词。我已经设置了正则表达式,以便它可以找到包含“坏”字母的单词,这样我们就可以将它们排除掉。

这是搜索模式:

badLetters = "[ghjknpqrtuvwzGHJKNPQRTUVWZ]"

这些字母不能在我的计算器上输入。

然而,像“sin”和“cos”这样的按钮存在。这意味着我只想在t后面不紧跟“an”的情况下返回匹配项。

我的第一次尝试看起来像这样:

badLetters = "[ghjknpqrt(?!an)uvwzGHJKNPQRT(?!an)UVWZ]"

但这不起作用。我认为它将(?!an)视为更多的坏字符,而不是应用否定前瞻。

所以我尝试为“t”和“T”分别给它们自己的方括号以应用条件。

badLetters = "[ghjknpqruvwzGHJKNPQRUVWZ]tT"

这也不起作用。

我如何在正则表达式中实现具有不同条件的多个不同字母?将它们放在主方括号中不起作用,将它们分别放在自己的方括号中也不起作用。

然后,显然,在此之后,我将扩展它以允许只有在前面跟有“ta”或“si”等情况下才能输入“n”。

谢谢!

英文:

I'm writing a program to find words that you can write on a calculator. I've set up regex so that it'll find words that contain 'bad' letters, so we can eliminate them.

Here's the search pattern:

badLetters = "[ghjknpqrtuvwzGHJKNPQRTUVWZ]"

Those letters can't be written on my calculator

However, buttons like 'sin' and 'cos' exist. That means that I want to return a match for t only if it is not immediately followed by 'an'

My first try looked like this:

badLetters = "[ghjknpqrt(?!an)uvwzGHJKNPQRT(?!an)UVWZ]"

But this doesn't work. I think it's treating the (?!an) as more bad characters, rather than applying the negative lookahead.

So I tried giving the 't' and 'T' their own square bracket for the condition to be applied to.

badLetters = "[ghjknpqruvwzGHJKNPQRUVWZ][tT](?!an)"

That's not working either.

How can I implement several different letters with different conditions for each in regex? Putting them in the main square bracket doesn't work, and neither does giving them their own square brackets.

And then obviously after this I'm going to expand it to allow 'n' only if it's preceeded by 'ta', or 'si' and such.

Thanks!

答案1

得分: 2

相信在你的正则表达式中使用 | 符号(表示 '或')会有帮助。

使用正则表达式 [ghjknpqruvwzGHJKNPQRUVWZ][tT](?!an),你正在寻找一个 '坏字母' 后跟 一个 't',而且不是在 'an' 之前,而 [ghjknpqruvwzGHJKNPQRUVWZ]|(?:[tT](?!an)) 则寻找一个 '坏字母' 一个 't',它不在 'an' 之前。然后,你可以在此基础上进行进一步构建。

类似 Regex101 这样的工具可能有助于理解和构建新的正则表达式搜索。

英文:

I believe using the | symbol (meaning 'or') in your regex will help.

With the regex [ghjknpqruvwzGHJKNPQRUVWZ][tT](?!an), you're looking for a 'bad letter' followed by a 't' that does no precede 'an', whereas [ghjknpqruvwzGHJKNPQRUVWZ]|(?:[tT](?!an)) looks for a 'bad letter' or a 't' that does no precede 'an'. You can then build on top of this as you suggest.

A tool like Regex101 may help with understanding and forming new regex searches.

答案2

得分: 0

使用详细模式处理复杂的模式,这样你可以使用空白来分隔并注释正则表达式模式的部分。re.IGNORECASE标志可能也会使事情变得更简单。

尝试创建一个可以找到所有"bad"字母的正则表达式可能会有困难。例如,编写一个规则来表示't'是不好的,除非它是'tan'或'sqrt'的一部分,会更难。

创建一个正则表达式来匹配可接受的字母可能会更容易。例如,可以明确列出计算器上的单词,并将单独的"good"字母添加为一个组。通常,将较长的替代项(例如单词)放在较短的单词或字母之前。因此,它可能看起来像这样:

good_letters = """
    ( sqrt | tan | sin | cos | log   # 计算器上的单词
    | exp | ln
    | [abcdefilmosuvwz]              # 可以显示的字母
    )+
"""

regex = re.compile(good_letters, re.VERBOSE | re.IGNORECASE)

regex.fullmatch(test_string)
英文:

Use verbose mode for complicated patterns, so you can use white space to break out and comment the pieces of the regex pattern. The re.IGNORECASE flag may make things simpler too.

Trying to come up with a regex that finds all "bad" letters may be difficult. For example, it would be harder to write a rule for 't' is bad unless it is part of 'tan' or 'sqrt'.

It may be easier to create a regex for acceptable letters. For example, the words on the calculator can be listed explicitly, and the individual "good" letters can be added as a group. Generally, put longer alternatives (e.g. words) before shorter words or letters. So it may look something like this:

good_letters = """
    ( sqrt | tan | sin | cos | log   # words on my calculator
    | exp | ln
    | [abcdefilmosuvwz]              # letters that can be displayed
    )+
"""

regex = re.compile(good_letters, re.VERBOSE | re.IGNORECASE)

regex.fullmatch(test_string)

huangapple
  • 本文由 发表于 2023年5月7日 19:46:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76193727.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定