如何在多列上执行字符串分割和拆分?

huangapple go评论91阅读模式
英文:

How to str split and explode on multiple columns?

问题

  1. df2 =(df.set_index(['Name','Source'])
  2. .apply(lambda x: x.str.split(';').explode())
  3. .reset_index())
英文:

Is there a way to split and explode on multiple columns?
This maybe a basic task but I am drawing a blank currently,

My pandas dataframe:

Name Title City Country Source
Haliey Wells Data Scientist; Data Analyst; Mathematician Paris; Suva; Paris France; FIJI; France Linkedin
Bron Levy Data Scientist; Data Analyst HELSINKI; Berlin Finland; Germany Kaggle
Grace Kalie Data Analyst; Mathematician Athens; Budapest Greece; Hungary Kaggle
Evan James ML Engineer; Developer Tokyo; Lima Japan; Peru Google

Currently the code that I have will only work on one column at a time:

  1. df2 =(df.set_index(['Name','Source']) #Columns that won't be touched by the index
  2. .apply(lambda x: x.str.split(';').explode()) #split on the ; [its actually a pipe(|) but for visual purposes I used a ;]
  3. .reset_index())

**Note: the code above generally works for me but I am usually using it on one column.

Desired output:

Name Title City Country Source
Haliey Wells Data Scientist Paris France Linkedin
Haliey Wells Data Analyst Suva Fiji Linkedin
Haliey Wells Mathematician Paris France Linkedin
Bron Levy Data Scientist HELSINKI Finland Kaggle
Bron Levy Data Analyst Berlin Germany Kaggle
Grace Kalie Data Analyst Athens Greece Kaggle
Grace Kalie Mathematician Budapest Hungary Kaggle
Evan James ML Engineer Tokyo Japan Google
Evan James Developer Lima Peru Google

答案1

得分: 5

  1. 首先拆分,然后进行常规的多列展开。例如:

cols = ['Title', 'City', 'Country']
df.assign(**{c: df[c].str.split('; ') for c in cols}).explode(cols)

  1. Name Title City Country Source

0 Haliey Wells Data Scientist Paris France Linkedin
0 Haliey Wells Data Analyst Suva FIJI Linkedin
0 Haliey Wells Mathematician Paris France Linkedin
1 Bron Levy Data Scientist HELSINKI Finland Kaggle
1 Bron Levy Data Analyst Berlin Germany Kaggle
2 Grace Kalie Data Analyst Athens Greece Kaggle
2 Grace Kalie Mathematician Budapest Hungary Kaggle
3 Evan James ML Engineer Tokyo Japan Google
3 Evan James Developer Lima Peru Google

  1. <details>
  2. <summary>英文:</summary>
  3. Split first, then do a normal multi-column explode. For example:

cols = ['Title', 'City', 'Country']
df.assign(**{c: df[c].str.split('; ') for c in cols}).explode(cols)

  1. Name Title City Country Source

0 Haliey Wells Data Scientist Paris France Linkedin
0 Haliey Wells Data Analyst Suva FIJI Linkedin
0 Haliey Wells Mathematician Paris France Linkedin
1 Bron Levy Data Scientist HELSINKI Finland Kaggle
1 Bron Levy Data Analyst Berlin Germany Kaggle
2 Grace Kalie Data Analyst Athens Greece Kaggle
2 Grace Kalie Mathematician Budapest Hungary Kaggle
3 Evan James ML Engineer Tokyo Japan Google
3 Evan James Developer Lima Peru Google

  1. </details>
  2. # 答案2
  3. **得分**: 2
  4. 以下是使用apply而不是字典推导和解包的另一种方法:
  5. ```python
  6. df.set_index(['Name', 'Source']) \
  7. .apply(lambda x: x.str.split(';')) \
  8. .explode(column=df.columns[1:-1].tolist()).reset_index()

输出:

  1. Name Source Title City Country
  2. 0 Haliey Wells Linkedin Data Scientist Paris France
  3. 1 Haliey Wells Linkedin Data Analyst Suva FIJI
  4. 2 Haliey Wells Linkedin Mathematician Paris France
  5. 3 Bron Levy Kaggle Data Scientist HELSINKI Finland
  6. 4 Bron Levy Kaggle Data Analyst Berlin Germany
  7. 5 Grace Kalie Kaggle Data Analyst Athens Greece
  8. 6 Grace Kalie Kaggle Mathematician Budapest Hungary
  9. 7 Evan James Google ML Engineer Tokyo Japan
  10. 8 Evan James Google Developer Lima Peru
英文:

Here's another way using apply instead of dictionary comprehension and unpacking:

  1. df.set_index([&#39;Name&#39;, &#39;Source&#39;])\
  2. .apply(lambda x: x.str.split(&#39;;&#39;))\
  3. .explode(column=df.columns[1:-1].tolist()).reset_index()

Output:

  1. Name Source Title City Country
  2. 0 Haliey Wells Linkedin Data Scientist Paris France
  3. 1 Haliey Wells Linkedin Data Analyst Suva FIJI
  4. 2 Haliey Wells Linkedin Mathematician Paris France
  5. 3 Bron Levy Kaggle Data Scientist HELSINKI Finland
  6. 4 Bron Levy Kaggle Data Analyst Berlin Germany
  7. 5 Grace Kalie Kaggle Data Analyst Athens Greece
  8. 6 Grace Kalie Kaggle Mathematician Budapest Hungary
  9. 7 Evan James Google ML Engineer Tokyo Japan
  10. 8 Evan James Google Developer Lima Peru

huangapple
  • 本文由 发表于 2023年5月14日 10:28:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245574.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定