英文:
Search for strings with more than 3 letters in a row
问题
现在我想对连续超过3个字母做同样的操作,以便检查拼写错误,比如"helllo"而不是"hello"。在下面的脚本中,我尝试了4个字母的情况:
doublesigns = dataset[dataset["columnA"].str.contains("\a?{3}|\b{3}|\d{3}", na=False)]
我得到了以下错误:
> 错误:在位置10处的坏转义\c
看起来错误发生在某些字母,但不是所有的字母。有人知道正确的脚本是什么吗?
英文:
I have the following script in Python that checks how many lines in column A in my dataset contains more than 3 signs like "?" or "!":
doublesigns = dataset[dataset["columnA"].str.contains("\?{3}|\!{3}", na=False)]
Now I want to do the same for more than 3 letters in a row, so that I can check errors in spelling, like "helllo" instead of "hello". In the script below I have tried this for 4 letters in the alphabet:
doublesigns = dataset[dataset["columnA"].str.contains("\a?{3}|\b{3}|\c{3}|\d{3}", na=False)]
I get the following error:
> error: bad escape \c at position 10
It looks like the error is occurring with certain letters, but not all of the letters. Does someone know what the right script is?
答案1
得分: 1
-
在正则表达式中,
?
和!
前面的反斜杠是因为这些符号有特殊含义。 -
如果你有
a|b|c|...|z
,你可以表示为[a-z]
。 -
你可以通过在括号中包围来捕获某些内容,例如
([a-z])
。 -
你可以使用反向引用来匹配前面捕获组
n
中完全相同的值,使用\n
。
因此,用于匹配完全相同的三个小写字母的正则表达式可以是 ([a-z])\1\1
或 ([a-z])\1{2}
,意思是一个字母后面跟着两个相同的副本。
在此处尝试一下。
英文:
-
The backslash before
?
and!
are because those symbols have special meaning for regular expressions. -
If you have
a|b|c|...|z
you can express that as[a-z]
. -
You can capture something by surrounding with parenthesis e.g.
([a-z])
. -
You can use backreferences to match exactly the same value of a preceding capture group
n
with a\n
.
So, a regular expression to match exactly three lower case letters would be
([a-z])\1\1
or ([a-z])\1{2}
, meaning, a letter followed by two copies of itself.
Try it here
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论