如何在 pandas 中获取姓和名,当姓是多个名字时。

huangapple go评论65阅读模式
英文:

How to get first name and last name when last name is multiple names in pandas

问题

我有一个数据框,需要分离名字的姓和名。到目前为止,我已经做到了这一步。

df = [['Victor De La Cruz', 'Ashley Smith', 'Angel Miguel Hernandez', 'Hank Hill']]

df['first_name'] = df.str.split().str[0]
df['last_name'] = df.str.split().str[1:]

输出结果如下:

   first_name          last_name
0      Victor  [De, La, Cruz]
1     Ashley           [Smith]
2      Angel  [Miguel, Hernandez]
3       Hank           [Hill]

我尝试使用 df['last_name'].replace('[', '') 来去除不需要的所有字符,但没有成功。

期望的输出如下:

  first_name         last_name
0       Paul       De La Cruz
1     Ashley             Smith
2      Angel  Miguel Hernandez
3       Hank              Hill

有任何建议吗?谢谢!

英文:

I have a data frame and need to separate first and last name. So far this is where I got to.

df = [['Victor De La Cruz', 'Ashley Smith', 'Angel Miguel Hernandez', 'Hank Hill']] 

df['first_name'] = df.str.split().str[0]
df['last_name'] = df.str.split().str[1:]

OutPut

first_name        last_name 
 Victor           [De, La, Cruz]
 Ashley           [Smith] 
 Angel            [Miguel, Hernandez]
 Hank             [Hill]

I have tried using df'last_name'].replace('[', '')for all characters not wanted but it didn't work.

Desired Output

 first_name      last_name 
   Paul          De La Cruz 
   Ashley        Smith 
   Angel         Miguel Hernandez
   Hank          Hill 

Any Suggestions would be helpful thank you!

答案1

得分: 1

split() 后,您的系列中包含列表对象,而不是字符串,这就是为什么 .replace() 没有意义的原因。

英文:

Just join back

df['last_name'] = df['last_name'].str.join(' ')

After the split(), you have list objects in your series, not strings, which is why .replace() doesn't make sense.

答案2

得分: 1

I'd suggest using the n keyword argument to limit the splits to only the first space. You could also use expand=True:

import pandas as pd

s = pd.Series([
    'Victor De La Cruz',
    'Ashley Smith',
    'Angel Miguel Hernandez',
    'Hank Hill'
])

df = s.str.split(n=1, expand=True)
df.columns = ["first_name", "last_name"]
  first_name         last_name
0     Victor        De La Cruz
1     Ashley             Smith
2      Angel  Miguel Hernandez
3       Hank              Hill
英文:

I'd suggest using the n keyword argument to limit the splits to only the first space. You could also use expand=True:

import pandas as pd

s = pd.Series([
    'Victor De La Cruz',
    'Ashley Smith',
    'Angel Miguel Hernandez',
    'Hank Hill'
])

df = s.str.split(n=1, expand=True)
df.columns = ["first_name", "last_name"]
  first_name         last_name
0     Victor        De La Cruz
1     Ashley             Smith
2      Angel  Miguel Hernandez
3       Hank              Hill

</details>



huangapple
  • 本文由 发表于 2023年5月30日 01:10:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76359178.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定