英文:
How to explode Python Pandas Dataframe based on string and criteria
问题
如何将 `StringDataFrame` 转换为:
|String|
|-|
|乔恩喜欢猫。|
|乔恩喜欢狗。|
|乔恩喜欢老虎。|
|乔恩喜欢羊驼。|
|乔恩吃苹果。|
|乔恩吃梨。|
|乔恩吃香蕉。|
|乔恩吃草莓。|
基于这个 `ThingsDataFrame`:
|Thing|Type|
|-|-|
|猫|动物|
|狗|动物|
|老虎|动物|
|羊驼|动物|
|苹果|水果|
|梨|水果|
|香蕉|水果|
|草莓|水果|
英文:
How to turn StringDataFrame
:
String |
---|
Jon likes {ExplodeAnimals}. |
Jon eats {ExplodeFruit}. |
Into this:
String |
---|
Jon likes Cats. |
Jon likes Dogs. |
Jon likes Tigers. |
Jon likes Llamas. |
Jon eats Apples. |
Jon eats Pears. |
Jon eats Bananas. |
Jon eats Strawberries. |
Based on this ThingsDataFrame
Thing | Type |
---|---|
Cats | animal |
Dogs | animal |
Tigers | animal |
Llamas | animal |
Apples | fruit |
Pears | fruit |
Bananas | fruit |
Strawberries | fruit |
答案1
得分: 2
选项 1
# 如果你使用了"Jon likes {animal}.",则可以跳过这个映射。
映射器 = {'ExplodeAnimals': 'animal', 'ExplodeFruit': 'fruit'}
输出 = (StringDataFrame['String']
.str.extract(r'(?P<String>.*) {(?P<Type>.*)}')
.assign(Type=lambda d: d['Type'].map(映射器))
.merge(ThingsDataFrame, on='Type')
.assign(String=lambda d: d['String']+' '+d['Thing'])
[['String']]
)
print(输出)
输出:
String
0 Jon likes Cats
1 Jon likes Dogs
2 Jon likes Tigers
3 Jon likes Llamas
4 Jon eats Apples
5 Jon eats Pears
6 Jon eats Bananas
7 Jon eats Strawberries
选项 2
可能效率较低但更灵活,使用花括号表示法执行花括号扩展(使用 braceexpand
模块):
# pip install braceexpand
from braceexpand import braceexpand
映射器 = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)
(StringDataFrame['String']
.str.replace(r'(?<={)([^{}]*)(?=})', lambda m: 映射器.get(m.group(1)))
.apply(lambda x: list(braceexpand(x)))
.explode()
)
注意:简化 StringDataFrame
输入为:
String
0 Jon likes {animal}.
1 Jon eats {fruit}.
输出:
0 Jon likes Cats.
0 Jon likes Dogs.
0 Jon likes Tigers.
0 Jon likes Llamas.
1 Jon eats Apples.
1 Jon eats Pears.
1 Jon eats Bananas.
1 Jon eats Strawberries.
Name: String, dtype: object
这使你能够进行有趣的操作,比如:
print(StringDataFrame)
# String
# 0 Jon likes {animal} that eat {fruit}.
print(ThingsDataFrame)
# Thing Type
# 0 Cats animal
# 1 Dogs animal
# 2 Apples fruit
# 3 Pears fruit
映射器 = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)
(StringDataFrame['String']
.str.replace(r'(?<={)([^{}]*)(?=})', lambda m: 映射器.get(m.group(1)))
.apply(lambda x: list(braceexpand(x)))
.explode()
)
# 0 Jon likes Cats that eat Apples.
# 0 Jon likes Cats that eat Pears.
# 0 Jon likes Dogs that eat Apples.
# 0 Jon likes Dogs that eat Pears.
# Name: String, dtype: object
英文:
option 1
# you could skip this mapping if you used "Jon likes {animal}."
mapper = {'ExplodeAnimals': 'animal', 'ExplodeFruit': 'fruit'}
out = (StringDataFrame['String']
.str.extract(r'(?P<String>.*) {(?P<Type>.*)}')
.assign(Type=lambda d: d['Type'].map(mapper))
.merge(ThingsDataFrame, on='Type')
.assign(String=lambda d: d['String']+' '+d['Thing'])
[['String']]
)
print(out)
Output:
String
0 Jon likes Cats
1 Jon likes Dogs
2 Jon likes Tigers
3 Jon likes Llamas
4 Jon eats Apples
5 Jon eats Pears
6 Jon eats Bananas
7 Jon eats Strawberries
option 2
probably less efficient but more versatile, using the curly bracket notation to perform brace expansion (with the braceexpand
module):
# pip install braceexpand
from braceexpand import braceexpand
mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)
(StringDataFrame['String']
.str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
.apply(lambda x: list(braceexpand(x)))
.explode()
)
NB. simplifying the StringDataFrame
input to:
String
0 Jon likes {animal}.
1 Jon eats {fruit}.
Output:
0 Jon likes Cats.
0 Jon likes Dogs.
0 Jon likes Tigers.
0 Jon likes Llamas.
1 Jon eats Apples.
1 Jon eats Pears.
1 Jon eats Bananas.
1 Jon eats Strawberries.
Name: String, dtype: object
This enables you to do funky stuff like:
print(StringDataFrame)
# String
# 0 Jon likes {animal} that eat {fruit}.
print(ThingsDataFrame)
# Thing Type
# 0 Cats animal
# 1 Dogs animal
# 2 Apples fruit
# 3 Pears fruit
mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)
(StringDataFrame['String']
.str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
.apply(lambda x: list(braceexpand(x)))
.explode()
)
# 0 Jon likes Cats that eat Apples.
# 0 Jon likes Cats that eat Pears.
# 0 Jon likes Dogs that eat Apples.
# 0 Jon likes Dogs that eat Pears.
# Name: String, dtype: object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论