2023年4月6日 22:00:54go评论58阅读模式

英文:

Polars .str.replace with expression or .str.split with regex

问题

我明白了，你想要对这个 dataframe 进行操作，将单词间的空格替换成逗号。以下是你想要的代码：

answer = pl.DataFrame({"equip": ['Amulets, Medals', 'Guns, Crossbows, Off-Hands', 'Melee Weapons, Shields, Off-Hands',
 'All Armor', 'Chest Armor', 'Shields', 'All Weapons, Shields, Off-Hands']})
print(answer)

对于你的额外问题，你可以使用类似的方法来处理更复杂的正则表达式模式。如果需要额外的帮助，请随时提出。

英文:

I have this dataframe:

sample = pl.DataFrame({&quot;equip&quot;: [&#39;AmuletsMedals&#39;, &#39;Guns, CrossbowsOff-Hands&#39;, &#39;Melee WeaponsShieldsOff-Hands&#39;,
     &#39;All Armor&#39;, &#39;Chest Armor&#39;, &#39;Shields&#39;, &#39;All WeaponsShieldsOff-Hands&#39;]})
    print(sample)

   shape: (7, 1)
┌───────────────────────────────┐
│ equip                         │
│ ---                           │
│ str                           │
╞═══════════════════════════════╡
│ AmuletsMedals                 │
│ Guns, CrossbowsOff-Hands      │
│ Melee WeaponsShieldsOff-Hands │
│ All Armor                     │
│ Chest Armor                   │
│ Shields                       │
│ All WeaponsShieldsOff-Hands   │
└───────────────────────────────┘

My aim is to put a comma between words:

answer = pl.DataFrame({&quot;equip&quot;: [&#39;Amulets, Medals&#39;, &#39;Guns, Crossbows, Off-Hands&#39;, &#39;Melee Weapons, Shields, Off-Hands&#39;,
 &#39;All Armor&#39;, &#39;Chest Armor&#39;, &#39;Shields&#39;, &#39;All Weapons, Shields, Off-Hands&#39;]})
print(answer)
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip                               │
│ ---                                 │
│ str                                 │
╞═════════════════════════════════════╡
│ Amulets, Medals                     │
│ Guns, Crossbows, Off-Hands          │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor                           │
│ Chest Armor                         │
│ Shields                             │
│ All Weapons, Shields, Off-Hands     │
└─────────────────────────────────────┘

I tried replace, but the replace didn't take an expression:

sample.with_columns(pl.col(&quot;equip&quot;).str.replace(&quot;[a-z][A-Z]&quot;, &quot;[a-z], [A-Z]&quot;))

and a tip found on polars github, but it cuts the last and first letter of the first and last word on each encounter, as it would with:

sample.with_columns(pl.col(&quot;equip&quot;).str.replace(&quot;[a-z][A-Z]&quot;, &quot;, &quot;))

Any ideas?

Bonus question:
I imagine the answer for the simple case would also solve the harder case, but in case it does not, here is the hard case:

I do have another column with a slightly harder regex pattern than "[a-z][A-Z]", should be something like "[a-z][A-Z]|[a-z]+|[a-z][1-9]" (I did not stress much about the exact regex yet). The aim is also to just put a comma between attributes:

sample2 = pl.DataFrame({&quot;attributes&quot;: [&#39;+10% Aether Damage+30 Defensive Ability16% Aether Resistance6% Less Damage from Aetherials6% Less Damage from Aether Corruptions&#39;,
     &#39;4-6 Aether Damage+25% Aether Damage10% Physical Damage converted to Aether DamageAether Tendril (Granted by Item)&#39;,
     &#39;2-8 Lightning Damage+25% Lightning Damage+25% Electrocute Damage10% Physical Damage converted to Lightning DamageEmpowered Lightning Nova (Granted by Item)&#39;,
     &#39;+10 Health Regenerated per Second+24 Armor20% Poison &amp; Acid Resistance&#39;,
     &#39;+22 Defensive Ability10% Chance to Avoid Projectiles+18 Armor&#39;,
     &#39;+15 Physique+10% Shield Block ChanceShield Slam (Granted by Item)&#39;,
     &#39;+10% Chaos Damage+30 Defensive Ability16% Chaos Resistance6% Less Damage from Chthonics&#39;]})

答案1

得分: 1

你可以在你的模式中使用捕获组：

df.with_columns(pl.col("equip").str.replace_all(r"([a-z])([A-Z])", "$1, $2"))

shape: (7, 1)
┌─────────────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═════════════════════════════════════╡
│ Amulets, Medals │
│ Guns, Crossbows, Off-Hands │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All Weapons, Shields, Off-Hands │
└─────────────────────────────────────┘

你也可以考虑使用 Unicode 类别 `\p{lower}` 和 `\p{upper}`。
polars 支持的正则表达式语法请参考：https://docs.rs/regex/latest/regex/

英文:

You can use capture groups in your pattern:

df.with_columns(pl.col(&quot;equip&quot;).str.replace_all(r&quot;([a-z])([A-Z])&quot;, &quot;$1, $2&quot;))

shape: (7, 1)
┌─────────────────────────────────────┐
│ equip                               │
│ ---                                 │
│ str                                 │
╞═════════════════════════════════════╡
│ Amulets, Medals                     │
│ Guns, Crossbows, Off-Hands          │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor                           │
│ Chest Armor                         │
│ Shields                             │
│ All Weapons, Shields, Off-Hands     │
└─────────────────────────────────────┘

You may also want to use the unicode classes \p{lower} and \p{upper} instead.

The regex syntax that polars supports is: https://docs.rs/regex/latest/regex/

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Polars 中的 .str.replace 使用表达式或 .str.split 使用正则表达式

问题

答案1

如何在Swift中将数组中的值替换为相同索引处的新值。

Polars 中的动态聚合

Using python polars `read_sql` query giving error `BINARY not supported` but there are no binary type columns in my table

我想用双引号替换此字符串字典中的所有单引号。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论