英文:
Pandas Str Regex Replace Characters Except for Word
问题
^((?!.*PIZZA.*).)*$|Z{2,}
这个正则表达式似乎没有达到你的预期效果,因为它仍然检测到了PIZZA中的ZZ。
你可以尝试以下正则表达式来实现你的要求:
(?<!PIZZA.*)Z{2,}
然后使用.str.replace将匹配到的部分替换为空格,这应该能够达到你想要的结果。
英文:
I have a Pandas DataFrame column with values like the following:
| Name |
|---|
| PIZZA NAME HERE ZZ HELLO |
| HELLO ZZZZZZZ WORLD |
I'd like to identify all instances of two or more Z's, without identifying a certain word, PIZZA in this case. I'd use .str.replace(regex,"") to replace it with whitespace afterwards. The final result would look like:
| Name |
|---|
| PIZZA NAME HERE HELLO |
| HELLO WORLD |
Not the best at understanding Regex, but the following doesn't work for me like I thought it would since it still detects the ZZ in PIZZA.
^((?!.*PIZZA.*).)*$|Z{2,}
答案1
得分: 1
你可以考虑改用词边界标记,比如这个模式\bZ{2,}\b。然而,如果你只是使用.str.replace(regex,""),会导致双空格,即PIZZA NAME HERE HELLO。
对此的简单修复是将边界标记改为空白标记,即\sZ{2,}\s。这将同时移除两个空格,因此需要使用.str.replace(regex," "),但在字符串的开头和结尾会失败。
所以为了解决字符串开头/结尾的问题,我们可以将这两种方法结合起来,使用(?:\s|\b)Z{2,}(?:\s|\b)。
英文:
You may want to consider using word boundary markers instead such as with this pattern \bZ{2,}\b. However, if you simply do .str.replace(regex,"") it will lead to a double space i.e. PIZZA NAME HERE HELLO.
The simple fix for that is changing either the boundary markers with whitespace markers i.e. \sZ{2,}\s. This will remove both spaces and thus requires .str.replace(regex," ") instead, and also fail at the start and end of the string.
So to fix the start/end of string issue, we can combine the two methods with (?:\s|\b)Z{2,}(?:\s|\b).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论