英文:
Pandas Str Regex Replace Characters Except for Word
问题
^((?!.*PIZZA.*).)*$|Z{2,}
这个正则表达式似乎没有达到你的预期效果,因为它仍然检测到了PIZZA中的ZZ。
你可以尝试以下正则表达式来实现你的要求:
(?<!PIZZA.*)Z{2,}
然后使用.str.replace
将匹配到的部分替换为空格,这应该能够达到你想要的结果。
英文:
I have a Pandas DataFrame column with values like the following:
Name |
---|
PIZZA NAME HERE ZZ HELLO |
HELLO ZZZZZZZ WORLD |
I'd like to identify all instances of two or more Z's, without identifying a certain word, PIZZA in this case. I'd use .str.replace(regex,"")
to replace it with whitespace afterwards. The final result would look like:
Name |
---|
PIZZA NAME HERE HELLO |
HELLO WORLD |
Not the best at understanding Regex, but the following doesn't work for me like I thought it would since it still detects the ZZ in PIZZA.
^((?!.*PIZZA.*).)*$|Z{2,}
答案1
得分: 1
你可以考虑改用词边界标记,比如这个模式\bZ{2,}\b
。然而,如果你只是使用.str.replace(regex,"")
,会导致双空格,即PIZZA NAME HERE HELLO
。
对此的简单修复是将边界标记改为空白标记,即\sZ{2,}\s
。这将同时移除两个空格,因此需要使用.str.replace(regex," ")
,但在字符串的开头和结尾会失败。
所以为了解决字符串开头/结尾的问题,我们可以将这两种方法结合起来,使用(?:\s|\b)Z{2,}(?:\s|\b)
。
英文:
You may want to consider using word boundary markers instead such as with this pattern \bZ{2,}\b
. However, if you simply do .str.replace(regex,"")
it will lead to a double space i.e. PIZZA NAME HERE HELLO
.
The simple fix for that is changing either the boundary markers with whitespace markers i.e. \sZ{2,}\s
. This will remove both spaces and thus requires .str.replace(regex," ")
instead, and also fail at the start and end of the string.
So to fix the start/end of string issue, we can combine the two methods with (?:\s|\b)Z{2,}(?:\s|\b)
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论