搜索连续三个以上字母的字符串。

huangapple go评论65阅读模式
英文:

Search for strings with more than 3 letters in a row

问题

现在我想对连续超过3个字母做同样的操作以便检查拼写错误比如"helllo"而不是"hello"在下面的脚本中我尝试了4个字母的情况

    doublesigns = dataset[dataset["columnA"].str.contains("\a?{3}|\b{3}|\d{3}", na=False)]

我得到了以下错误
> 错误在位置10处的坏转义\c

看起来错误发生在某些字母但不是所有的字母有人知道正确的脚本是什么吗
英文:

I have the following script in Python that checks how many lines in column A in my dataset contains more than 3 signs like "?" or "!":

doublesigns = dataset[dataset["columnA"].str.contains("\?{3}|\!{3}", na=False)]

Now I want to do the same for more than 3 letters in a row, so that I can check errors in spelling, like "helllo" instead of "hello". In the script below I have tried this for 4 letters in the alphabet:

doublesigns = dataset[dataset["columnA"].str.contains("\a?{3}|\b{3}|\c{3}|\d{3}", na=False)]

I get the following error:
> error: bad escape \c at position 10

It looks like the error is occurring with certain letters, but not all of the letters. Does someone know what the right script is?

答案1

得分: 1

  1. 在正则表达式中,?! 前面的反斜杠是因为这些符号有特殊含义。

  2. 如果你有 a|b|c|...|z,你可以表示为 [a-z]

  3. 你可以通过在括号中包围来捕获某些内容,例如 ([a-z])

  4. 你可以使用反向引用来匹配前面捕获组 n 中完全相同的值,使用 \n

因此,用于匹配完全相同的三个小写字母的正则表达式可以是 ([a-z])\1\1([a-z])\1{2},意思是一个字母后面跟着两个相同的副本。

此处尝试一下。

英文:
  1. The backslash before ? and ! are because those symbols have special meaning for regular expressions.

  2. If you have a|b|c|...|z you can express that as [a-z].

  3. You can capture something by surrounding with parenthesis e.g. ([a-z]).

  4. You can use backreferences to match exactly the same value of a preceding capture group n with a \n.

So, a regular expression to match exactly three lower case letters would be
([a-z])\1\1 or ([a-z])\1{2}, meaning, a letter followed by two copies of itself.

Try it here

huangapple
  • 本文由 发表于 2023年3月8日 17:16:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671230.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定