根据字符串和条件来拆分Python Pandas数据框。

huangapple go评论56阅读模式
英文:

How to explode Python Pandas Dataframe based on string and criteria

问题

如何将 `StringDataFrame` 转换为:

|String|
|-|
|乔恩喜欢猫。|
|乔恩喜欢狗。|
|乔恩喜欢老虎。|
|乔恩喜欢羊驼。|
|乔恩吃苹果。|
|乔恩吃梨。|
|乔恩吃香蕉。|
|乔恩吃草莓。|

基于这个 `ThingsDataFrame`
|Thing|Type|
|-|-|
|猫|动物|
|狗|动物|
|老虎|动物|
|羊驼|动物|
|苹果|水果|
|梨|水果|
|香蕉|水果|
|草莓|水果|
英文:

How to turn StringDataFrame:

String
Jon likes {ExplodeAnimals}.
Jon eats {ExplodeFruit}.

Into this:

String
Jon likes Cats.
Jon likes Dogs.
Jon likes Tigers.
Jon likes Llamas.
Jon eats Apples.
Jon eats Pears.
Jon eats Bananas.
Jon eats Strawberries.

Based on this ThingsDataFrame

Thing Type
Cats animal
Dogs animal
Tigers animal
Llamas animal
Apples fruit
Pears fruit
Bananas fruit
Strawberries fruit

答案1

得分: 2

选项 1

你可以使用 merge/map

# 如果你使用了"Jon likes {animal}.",则可以跳过这个映射。
映射器 = {'ExplodeAnimals': 'animal', 'ExplodeFruit': 'fruit'}

输出 = (StringDataFrame['String']
  .str.extract(r'(?P<String>.*) {(?P<Type>.*)}')
  .assign(Type=lambda d: d['Type'].map(映射器))
  .merge(ThingsDataFrame, on='Type')
  .assign(String=lambda d: d['String']+' '+d['Thing'])
  [['String']]
)

print(输出)

输出:

                  String
0         Jon likes Cats
1         Jon likes Dogs
2       Jon likes Tigers
3       Jon likes Llamas
4        Jon eats Apples
5         Jon eats Pears
6       Jon eats Bananas
7  Jon eats Strawberries

选项 2

可能效率较低但更灵活,使用花括号表示法执行花括号扩展(使用 braceexpand 模块):

# pip install braceexpand
from braceexpand import braceexpand

映射器 = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)

(StringDataFrame['String']
 .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: 映射器.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

注意:简化 StringDataFrame 输入为:

                String
0  Jon likes {animal}.
1    Jon eats {fruit}.

输出:

0           Jon likes Cats.
0           Jon likes Dogs.
0         Jon likes Tigers.
0         Jon likes Llamas.
1          Jon eats Apples.
1           Jon eats Pears.
1         Jon eats Bananas.
1    Jon eats Strawberries.
Name: String, dtype: object

这使你能够进行有趣的操作,比如:

print(StringDataFrame)
#                                  String
# 0  Jon likes {animal} that eat {fruit}.

print(ThingsDataFrame)
#     Thing    Type
# 0    Cats  animal
# 1    Dogs  animal
# 2  Apples   fruit
# 3   Pears   fruit

映射器 = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)

(StringDataFrame['String']
 .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: 映射器.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

# 0    Jon likes Cats that eat Apples.
# 0     Jon likes Cats that eat Pears.
# 0    Jon likes Dogs that eat Apples.
# 0     Jon likes Dogs that eat Pears.
# Name: String, dtype: object
英文:

option 1

You can use merge/map.

# you could skip this mapping if you used &quot;Jon likes {animal}.&quot;
mapper = {&#39;ExplodeAnimals&#39;: &#39;animal&#39;, &#39;ExplodeFruit&#39;: &#39;fruit&#39;}

out = (StringDataFrame[&#39;String&#39;]
  .str.extract(r&#39;(?P&lt;String&gt;.*) {(?P&lt;Type&gt;.*)}&#39;)
  .assign(Type=lambda d: d[&#39;Type&#39;].map(mapper))
  .merge(ThingsDataFrame, on=&#39;Type&#39;)
  .assign(String=lambda d: d[&#39;String&#39;]+&#39; &#39;+d[&#39;Thing&#39;])
  [[&#39;String&#39;]]
)

print(out)

Output:

                  String
0         Jon likes Cats
1         Jon likes Dogs
2       Jon likes Tigers
3       Jon likes Llamas
4        Jon eats Apples
5         Jon eats Pears
6       Jon eats Bananas
7  Jon eats Strawberries

option 2

probably less efficient but more versatile, using the curly bracket notation to perform brace expansion (with the braceexpand module):

# pip install braceexpand
from braceexpand import braceexpand

mapper = ThingsDataFrame.groupby(&#39;Type&#39;)[&#39;Thing&#39;].agg(&#39;,&#39;.join)

(StringDataFrame[&#39;String&#39;]
 .str.replace(r&#39;(?&lt;={)([^{}]*)(?=})&#39;, lambda m: mapper.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

NB. simplifying the StringDataFrame input to:

                String
0  Jon likes {animal}.
1    Jon eats {fruit}.

Output:

0           Jon likes Cats.
0           Jon likes Dogs.
0         Jon likes Tigers.
0         Jon likes Llamas.
1          Jon eats Apples.
1           Jon eats Pears.
1         Jon eats Bananas.
1    Jon eats Strawberries.
Name: String, dtype: object

This enables you to do funky stuff like:

print(StringDataFrame)
#                                  String
# 0  Jon likes {animal} that eat {fruit}.

print(ThingsDataFrame)
#     Thing    Type
# 0    Cats  animal
# 1    Dogs  animal
# 2  Apples   fruit
# 3   Pears   fruit

mapper = ThingsDataFrame.groupby(&#39;Type&#39;)[&#39;Thing&#39;].agg(&#39;,&#39;.join)

(StringDataFrame[&#39;String&#39;]
 .str.replace(r&#39;(?&lt;={)([^{}]*)(?=})&#39;, lambda m: mapper.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

# 0    Jon likes Cats that eat Apples.
# 0     Jon likes Cats that eat Pears.
# 0    Jon likes Dogs that eat Apples.
# 0     Jon likes Dogs that eat Pears.
# Name: String, dtype: object

huangapple
  • 本文由 发表于 2023年5月14日 00:29:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76243806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定