英文:
How to create new pandas columns from a pandas data frame column content
问题
import pandas as pd
import numpy as np
ds1 = {'Name': ["Juan Pablo Montoya", "Jose Hiero", "Martin Vasquez"], 'Comments': ["DOB 18 May 1967; POB Mexico.", "POB Mexico.", "-0-"]}
df1 = pd.DataFrame(data=ds1)
# Extract DOB and POB values
df1['DOB'] = df1['Comments'].str.extract(r'DOB (.*?);')
df1['POB'] = df1['Comments'].str.extract(r'POB (.*?).')
# Replace '-0-' with NaN
df1.replace('-0-', np.nan, inplace=True)
# Display the resulting DataFrame
df1
结果如下所示:
Name Comments DOB POB
0 Juan Pablo Montoya DOB 18 May 1967; POB Mexico. 18 May 1967 Mexico
1 Jose Hiero POB Mexico. NaN Mexico
2 Martin Vasquez -0- NaN NaN
英文:
I have the following pandas dataframe:
import pandas as pd
import numpy as np
ds1 = {'Name':["Juan Pablo Montoya","Jose Hiero","Martin Vasquez"], "Comments" : ["DOB 18 May 1967; POB Mexico.","POB Mexico.","-0-"]}
df1 = pd.DataFrame(data=ds1)
Which looks like this:
print(df1)
Name Comments
0 Juan Pablo Montoya DOB 18 May 1967; POB Mexico.
1 Jose Hiero POB Mexico.
2 Martin Vasquez -0-
I need to create two new columns based on the contents of the Comments
column.
The names of the two new columns are:
DOB
POB
The values in column DOB
is the value following DOB in the Comments
column (up until the semi-colon).
The values in column POB
is the value following POB in the Comments
column (up until the dot).
If there is a value of "-0-" in the Comments
then both new columns contain NaN
.
So, from the example above, the resulting data frame would look like this:
答案1
得分: 1
可以使用正则表达式解析来完成:
df1['DOB'] = df1['Comments'].str.extract(r'DOB (\d{1,2} \w+ \d{4})')
df1['POB'] = df1['Comments'].str.extract(r'POB ([a-zA-Z\- .]+)')
英文:
It cane be done with regex parsing:
df1['DOB'] = df1['Comments'].str.extract(r'DOB (\d{1,2} \w+ \d{4})')
df1['POB'] = df1['Comments'].str.extract(r'POB ([a-zA-Z\- .]+)')
答案2
得分: 1
你可以使用str.extract()
来完成这个操作:
df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)&')
输出:
Name Comments DOB POB
0 Juan Pablo Montoya DOB 18 May 1967; POB Mexico. 18 May 1967 Mexico.
1 Jose Hiero POB Mexico. NaN Mexico.
2 Martin Vasquez -0- NaN NaN
英文:
You can do this with str.extract()
:
df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)')
Output:
Name Comments DOB POB
0 Juan Pablo Montoya DOB 18 May 1967; POB Mexico. 18 May 1967 Mexico.
1 Jose Hiero POB Mexico. NaN Mexico.
2 Martin Vasquez -0- NaN NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论