如何在多列上执行字符串分割和拆分?

huangapple go评论53阅读模式
英文:

How to str split and explode on multiple columns?

问题

df2 =(df.set_index(['Name','Source'])
 .apply(lambda x: x.str.split(';').explode())
 .reset_index())
英文:

Is there a way to split and explode on multiple columns?
This maybe a basic task but I am drawing a blank currently,

My pandas dataframe:

Name Title City Country Source
Haliey Wells Data Scientist; Data Analyst; Mathematician Paris; Suva; Paris France; FIJI; France Linkedin
Bron Levy Data Scientist; Data Analyst HELSINKI; Berlin Finland; Germany Kaggle
Grace Kalie Data Analyst; Mathematician Athens; Budapest Greece; Hungary Kaggle
Evan James ML Engineer; Developer Tokyo; Lima Japan; Peru Google

Currently the code that I have will only work on one column at a time:

df2 =(df.set_index(['Name','Source']) #Columns that won't be touched by the index
 .apply(lambda x: x.str.split(';').explode()) #split on the ; [its actually a pipe(|) but for visual purposes I used a ;]
 .reset_index())

**Note: the code above generally works for me but I am usually using it on one column.

Desired output:

Name Title City Country Source
Haliey Wells Data Scientist Paris France Linkedin
Haliey Wells Data Analyst Suva Fiji Linkedin
Haliey Wells Mathematician Paris France Linkedin
Bron Levy Data Scientist HELSINKI Finland Kaggle
Bron Levy Data Analyst Berlin Germany Kaggle
Grace Kalie Data Analyst Athens Greece Kaggle
Grace Kalie Mathematician Budapest Hungary Kaggle
Evan James ML Engineer Tokyo Japan Google
Evan James Developer Lima Peru Google

答案1

得分: 5

首先拆分,然后进行常规的多列展开。例如:

cols = ['Title', 'City', 'Country']
df.assign(**{c: df[c].str.split('; ') for c in cols}).explode(cols)


       Name           Title      City  Country    Source

0 Haliey Wells Data Scientist Paris France Linkedin
0 Haliey Wells Data Analyst Suva FIJI Linkedin
0 Haliey Wells Mathematician Paris France Linkedin
1 Bron Levy Data Scientist HELSINKI Finland Kaggle
1 Bron Levy Data Analyst Berlin Germany Kaggle
2 Grace Kalie Data Analyst Athens Greece Kaggle
2 Grace Kalie Mathematician Budapest Hungary Kaggle
3 Evan James ML Engineer Tokyo Japan Google
3 Evan James Developer Lima Peru Google


<details>
<summary>英文:</summary>

Split first, then do a normal multi-column explode. For example:

cols = ['Title', 'City', 'Country']
df.assign(**{c: df[c].str.split('; ') for c in cols}).explode(cols)


       Name           Title      City  Country    Source

0 Haliey Wells Data Scientist Paris France Linkedin
0 Haliey Wells Data Analyst Suva FIJI Linkedin
0 Haliey Wells Mathematician Paris France Linkedin
1 Bron Levy Data Scientist HELSINKI Finland Kaggle
1 Bron Levy Data Analyst Berlin Germany Kaggle
2 Grace Kalie Data Analyst Athens Greece Kaggle
2 Grace Kalie Mathematician Budapest Hungary Kaggle
3 Evan James ML Engineer Tokyo Japan Google
3 Evan James Developer Lima Peru Google


</details>



# 答案2
**得分**: 2

以下是使用apply而不是字典推导和解包的另一种方法:

```python
df.set_index(['Name', 'Source']) \
  .apply(lambda x: x.str.split(';')) \
  .explode(column=df.columns[1:-1].tolist()).reset_index()

输出:

                Name    Source            Title        City    Country
0  Haliey Wells   Linkedin   Data Scientist       Paris     France
1  Haliey Wells   Linkedin     Data Analyst        Suva       FIJI
2  Haliey Wells   Linkedin   Mathematician       Paris     France 
3     Bron Levy     Kaggle   Data Scientist    HELSINKI    Finland
4     Bron Levy     Kaggle    Data Analyst      Berlin    Germany 
5   Grace Kalie     Kaggle     Data Analyst      Athens     Greece
6   Grace Kalie     Kaggle   Mathematician    Budapest    Hungary 
7    Evan James     Google      ML Engineer       Tokyo      Japan
8    Evan James     Google       Developer        Lima       Peru
英文:

Here's another way using apply instead of dictionary comprehension and unpacking:

df.set_index([&#39;Name&#39;, &#39;Source&#39;])\
  .apply(lambda x: x.str.split(&#39;;&#39;))\
  .explode(column=df.columns[1:-1].tolist()).reset_index()

Output:

            Name    Source            Title        City    Country
0  Haliey Wells   Linkedin   Data Scientist       Paris     France
1  Haliey Wells   Linkedin     Data Analyst        Suva       FIJI
2  Haliey Wells   Linkedin   Mathematician       Paris     France 
3     Bron Levy     Kaggle   Data Scientist    HELSINKI    Finland
4     Bron Levy     Kaggle    Data Analyst      Berlin    Germany 
5   Grace Kalie     Kaggle     Data Analyst      Athens     Greece
6   Grace Kalie     Kaggle   Mathematician    Budapest    Hungary 
7    Evan James     Google      ML Engineer       Tokyo      Japan
8    Evan James     Google       Developer        Lima       Peru 

huangapple
  • 本文由 发表于 2023年5月14日 10:28:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245574.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定