如何在 pandas 中获取姓和名,当姓是多个名字时。

huangapple go评论94阅读模式
英文:

How to get first name and last name when last name is multiple names in pandas

问题

我有一个数据框,需要分离名字的姓和名。到目前为止,我已经做到了这一步。

  1. df = [['Victor De La Cruz', 'Ashley Smith', 'Angel Miguel Hernandez', 'Hank Hill']]
  2. df['first_name'] = df.str.split().str[0]
  3. df['last_name'] = df.str.split().str[1:]

输出结果如下:

  1. first_name last_name
  2. 0 Victor [De, La, Cruz]
  3. 1 Ashley [Smith]
  4. 2 Angel [Miguel, Hernandez]
  5. 3 Hank [Hill]

我尝试使用 df['last_name'].replace('[', '') 来去除不需要的所有字符,但没有成功。

期望的输出如下:

  1. first_name last_name
  2. 0 Paul De La Cruz
  3. 1 Ashley Smith
  4. 2 Angel Miguel Hernandez
  5. 3 Hank Hill

有任何建议吗?谢谢!

英文:

I have a data frame and need to separate first and last name. So far this is where I got to.

  1. df = [['Victor De La Cruz', 'Ashley Smith', 'Angel Miguel Hernandez', 'Hank Hill']]
  2. df['first_name'] = df.str.split().str[0]
  3. df['last_name'] = df.str.split().str[1:]

OutPut

  1. first_name last_name
  2. Victor [De, La, Cruz]
  3. Ashley [Smith]
  4. Angel [Miguel, Hernandez]
  5. Hank [Hill]

I have tried using df'last_name'].replace('[', '')for all characters not wanted but it didn't work.

Desired Output

  1. first_name last_name
  2. Paul De La Cruz
  3. Ashley Smith
  4. Angel Miguel Hernandez
  5. Hank Hill

Any Suggestions would be helpful thank you!

答案1

得分: 1

split() 后,您的系列中包含列表对象,而不是字符串,这就是为什么 .replace() 没有意义的原因。

英文:

Just join back

  1. df['last_name'] = df['last_name'].str.join(' ')

After the split(), you have list objects in your series, not strings, which is why .replace() doesn't make sense.

答案2

得分: 1

I'd suggest using the n keyword argument to limit the splits to only the first space. You could also use expand=True:

  1. import pandas as pd
  2. s = pd.Series([
  3. 'Victor De La Cruz',
  4. 'Ashley Smith',
  5. 'Angel Miguel Hernandez',
  6. 'Hank Hill'
  7. ])
  8. df = s.str.split(n=1, expand=True)
  9. df.columns = ["first_name", "last_name"]
  1. first_name last_name
  2. 0 Victor De La Cruz
  3. 1 Ashley Smith
  4. 2 Angel Miguel Hernandez
  5. 3 Hank Hill
英文:

I'd suggest using the n keyword argument to limit the splits to only the first space. You could also use expand=True:

  1. import pandas as pd
  2. s = pd.Series([
  3. 'Victor De La Cruz',
  4. 'Ashley Smith',
  5. 'Angel Miguel Hernandez',
  6. 'Hank Hill'
  7. ])
  8. df = s.str.split(n=1, expand=True)
  9. df.columns = ["first_name", "last_name"]
  1. first_name last_name
  2. 0 Victor De La Cruz
  3. 1 Ashley Smith
  4. 2 Angel Miguel Hernandez
  5. 3 Hank Hill
  6. </details>

huangapple
  • 本文由 发表于 2023年5月30日 01:10:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76359178.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定