Pandas Str Regex替换除单词之外的字符

huangapple go评论79阅读模式
英文:

Pandas Str Regex Replace Characters Except for Word

问题

^((?!.*PIZZA.*).)*$|Z{2,}

这个正则表达式似乎没有达到你的预期效果,因为它仍然检测到了PIZZA中的ZZ。

你可以尝试以下正则表达式来实现你的要求:

(?<!PIZZA.*)Z{2,}

然后使用.str.replace将匹配到的部分替换为空格,这应该能够达到你想要的结果。

英文:

I have a Pandas DataFrame column with values like the following:

Name
PIZZA NAME HERE ZZ HELLO
HELLO ZZZZZZZ WORLD

I'd like to identify all instances of two or more Z's, without identifying a certain word, PIZZA in this case. I'd use .str.replace(regex,&quot;&quot;) to replace it with whitespace afterwards. The final result would look like:

Name
PIZZA NAME HERE HELLO
HELLO WORLD

Not the best at understanding Regex, but the following doesn't work for me like I thought it would since it still detects the ZZ in PIZZA.

^((?!.*PIZZA.*).)*$|Z{2,}

答案1

得分: 1

你可以考虑改用词边界标记,比如这个模式\bZ{2,}\b。然而,如果你只是使用.str.replace(regex,""),会导致双空格,即PIZZA NAME HERE HELLO

对此的简单修复是将边界标记改为空白标记,即\sZ{2,}\s。这将同时移除两个空格,因此需要使用.str.replace(regex," "),但在字符串的开头和结尾会失败。

所以为了解决字符串开头/结尾的问题,我们可以将这两种方法结合起来,使用(?:\s|\b)Z{2,}(?:\s|\b)

Regex101链接

英文:

You may want to consider using word boundary markers instead such as with this pattern \bZ{2,}\b. However, if you simply do .str.replace(regex,&quot;&quot;) it will lead to a double space i.e. PIZZA NAME HERE HELLO.

The simple fix for that is changing either the boundary markers with whitespace markers i.e. \sZ{2,}\s. This will remove both spaces and thus requires .str.replace(regex,&quot; &quot;) instead, and also fail at the start and end of the string.

So to fix the start/end of string issue, we can combine the two methods with (?:\s|\b)Z{2,}(?:\s|\b).

Regex101 link

huangapple
  • 本文由 发表于 2023年6月15日 09:31:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76478526.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定