如何从 pandas 数据帧列内容中创建新的 pandas 列

huangapple go评论64阅读模式
英文:

How to create new pandas columns from a pandas data frame column content

问题

import pandas as pd
import numpy as np

ds1 = {'Name': ["Juan Pablo Montoya", "Jose Hiero", "Martin Vasquez"], 'Comments': ["DOB 18 May 1967; POB Mexico.", "POB Mexico.", "-0-"]}

df1 = pd.DataFrame(data=ds1)

# Extract DOB and POB values
df1['DOB'] = df1['Comments'].str.extract(r'DOB (.*?);')
df1['POB'] = df1['Comments'].str.extract(r'POB (.*?).')

# Replace '-0-' with NaN
df1.replace('-0-', np.nan, inplace=True)

# Display the resulting DataFrame
df1

结果如下所示:

                Name                      Comments            DOB      POB
0  Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.  18 May 1967   Mexico
1          Jose Hiero                   POB Mexico.            NaN   Mexico
2      Martin Vasquez                           -0-            NaN     NaN
英文:

I have the following pandas dataframe:

import pandas as pd
import numpy as np


ds1 = {'Name':["Juan Pablo Montoya","Jose Hiero","Martin Vasquez"], "Comments" : ["DOB 18 May 1967; POB Mexico.","POB Mexico.","-0-"]}

df1 = pd.DataFrame(data=ds1)

Which looks like this:

print(df1)
                 Name                      Comments
0  Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.
1          Jose Hiero                   POB Mexico.
2      Martin Vasquez                           -0-

I need to create two new columns based on the contents of the Comments column.
The names of the two new columns are:

  • DOB
  • POB

The values in column DOB is the value following DOB in the Comments column (up until the semi-colon).
The values in column POB is the value following POB in the Comments column (up until the dot).
If there is a value of "-0-" in the Comments then both new columns contain NaN.

So, from the example above, the resulting data frame would look like this:
如何从 pandas 数据帧列内容中创建新的 pandas 列

答案1

得分: 1

可以使用正则表达式解析来完成:

df1['DOB'] = df1['Comments'].str.extract(r'DOB (\d{1,2} \w+ \d{4})')
df1['POB'] = df1['Comments'].str.extract(r'POB ([a-zA-Z\- .]+)')
英文:

It cane be done with regex parsing:

df1['DOB'] = df1['Comments'].str.extract(r'DOB (\d{1,2} \w+ \d{4})')
df1['POB'] = df1['Comments'].str.extract(r'POB ([a-zA-Z\- .]+)')

答案2

得分: 1

你可以使用str.extract()来完成这个操作:

df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)&')

输出:

    Name                Comments                    DOB          POB
0   Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.  18 May 1967  Mexico.
1   Jose Hiero          POB Mexico.                  NaN         Mexico.
2   Martin Vasquez      -0-                          NaN         NaN
英文:

You can do this with str.extract():

df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)')

Output:

	Name	            Comments	                    DOB	          POB
0	Juan Pablo Montoya	DOB 18 May 1967; POB Mexico.	18 May 1967	  Mexico.
1	Jose Hiero	        POB Mexico.	                    NaN	          Mexico.
2	Martin Vasquez	    -0-	                            NaN	          NaN

huangapple
  • 本文由 发表于 2023年6月15日 21:33:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76483034.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定