英文:
Polars .str.replace with expression or .str.split with regex
问题
我明白了,你想要对这个 dataframe 进行操作,将单词间的空格替换成逗号。以下是你想要的代码:
answer = pl.DataFrame({"equip": ['Amulets, Medals', 'Guns, Crossbows, Off-Hands', 'Melee Weapons, Shields, Off-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All Weapons, Shields, Off-Hands']})
print(answer)
对于你的额外问题,你可以使用类似的方法来处理更复杂的正则表达式模式。如果需要额外的帮助,请随时提出。
英文:
I have this dataframe:
sample = pl.DataFrame({"equip": ['AmuletsMedals', 'Guns, CrossbowsOff-Hands', 'Melee WeaponsShieldsOff-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All WeaponsShieldsOff-Hands']})
print(sample)
shape: (7, 1)
┌───────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═══════════════════════════════╡
│ AmuletsMedals │
│ Guns, CrossbowsOff-Hands │
│ Melee WeaponsShieldsOff-Hands │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All WeaponsShieldsOff-Hands │
└───────────────────────────────┘
My aim is to put a comma between words:
answer = pl.DataFrame({"equip": ['Amulets, Medals', 'Guns, Crossbows, Off-Hands', 'Melee Weapons, Shields, Off-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All Weapons, Shields, Off-Hands']})
print(answer)
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═════════════════════════════════════╡
│ Amulets, Medals │
│ Guns, Crossbows, Off-Hands │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All Weapons, Shields, Off-Hands │
└─────────────────────────────────────┘
I tried replace, but the replace didn't take an expression:
sample.with_columns(pl.col("equip").str.replace("[a-z][A-Z]", "[a-z], [A-Z]"))
and a tip found on polars github, but it cuts the last and first letter of the first and last word on each encounter, as it would with:
sample.with_columns(pl.col("equip").str.replace("[a-z][A-Z]", ", "))
Any ideas?
Bonus question:
I imagine the answer for the simple case would also solve the harder case, but in case it does not, here is the hard case:
I do have another column with a slightly harder regex pattern than "[a-z][A-Z]", should be something like "[a-z][A-Z]|[a-z]+|[a-z][1-9]" (I did not stress much about the exact regex yet). The aim is also to just put a comma between attributes:
sample2 = pl.DataFrame({"attributes": ['+10% Aether Damage+30 Defensive Ability16% Aether Resistance6% Less Damage from Aetherials6% Less Damage from Aether Corruptions',
'4-6 Aether Damage+25% Aether Damage10% Physical Damage converted to Aether DamageAether Tendril (Granted by Item)',
'2-8 Lightning Damage+25% Lightning Damage+25% Electrocute Damage10% Physical Damage converted to Lightning DamageEmpowered Lightning Nova (Granted by Item)',
'+10 Health Regenerated per Second+24 Armor20% Poison & Acid Resistance',
'+22 Defensive Ability10% Chance to Avoid Projectiles+18 Armor',
'+15 Physique+10% Shield Block ChanceShield Slam (Granted by Item)',
'+10% Chaos Damage+30 Defensive Ability16% Chaos Resistance6% Less Damage from Chthonics']})
答案1
得分: 1
你可以在你的模式中使用捕获组:
df.with_columns(pl.col("equip").str.replace_all(r"([a-z])([A-Z])", "$1, $2"))
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═════════════════════════════════════╡
│ Amulets, Medals │
│ Guns, Crossbows, Off-Hands │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All Weapons, Shields, Off-Hands │
└─────────────────────────────────────┘
你也可以考虑使用 Unicode 类别 `\p{lower}` 和 `\p{upper}`。
polars 支持的正则表达式语法请参考:https://docs.rs/regex/latest/regex/
英文:
You can use capture groups in your pattern:
df.with_columns(pl.col("equip").str.replace_all(r"([a-z])([A-Z])", "$1, $2"))
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═════════════════════════════════════╡
│ Amulets, Medals │
│ Guns, Crossbows, Off-Hands │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All Weapons, Shields, Off-Hands │
└─────────────────────────────────────┘
You may also want to use the unicode classes \p{lower}
and \p{upper}
instead.
The regex syntax that polars supports is: https://docs.rs/regex/latest/regex/
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论