2023年6月15日 21:33:26go评论72阅读模式

英文:

How to create new pandas columns from a pandas data frame column content

问题

import pandas as pd
import numpy as np

ds1 = {'Name': ["Juan Pablo Montoya", "Jose Hiero", "Martin Vasquez"], 'Comments': ["DOB 18 May 1967; POB Mexico.", "POB Mexico.", "-0-"]}

df1 = pd.DataFrame(data=ds1)

# Extract DOB and POB values
df1['DOB'] = df1['Comments'].str.extract(r'DOB (.*?);')
df1['POB'] = df1['Comments'].str.extract(r'POB (.*?).')

# Replace '-0-' with NaN
df1.replace('-0-', np.nan, inplace=True)

# Display the resulting DataFrame
df1

结果如下所示：

                Name                      Comments            DOB      POB
0  Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.  18 May 1967   Mexico
1          Jose Hiero                   POB Mexico.            NaN   Mexico
2      Martin Vasquez                           -0-            NaN     NaN

英文:

I have the following pandas dataframe:

import pandas as pd
import numpy as np


ds1 = {&#39;Name&#39;:[&quot;Juan Pablo Montoya&quot;,&quot;Jose Hiero&quot;,&quot;Martin Vasquez&quot;], &quot;Comments&quot; : [&quot;DOB 18 May 1967; POB Mexico.&quot;,&quot;POB Mexico.&quot;,&quot;-0-&quot;]}

df1 = pd.DataFrame(data=ds1)

Which looks like this:

print(df1)
                 Name                      Comments
0  Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.
1          Jose Hiero                   POB Mexico.
2      Martin Vasquez                           -0-

I need to create two new columns based on the contents of the Comments column.
The names of the two new columns are:

DOB
POB

The values in column DOB is the value following DOB in the Comments column (up until the semi-colon).
The values in column POB is the value following POB in the Comments column (up until the dot).
If there is a value of "-0-" in the Comments then both new columns contain NaN.

So, from the example above, the resulting data frame would look like this:

答案1

得分: 1

可以使用正则表达式解析来完成：

df1['DOB'] = df1['Comments'].str.extract(r'DOB (\d{1,2} \w+ \d{4})')
df1['POB'] = df1['Comments'].str.extract(r'POB ([a-zA-Z\- .]+)')

英文:

It cane be done with regex parsing:

df1[&#39;DOB&#39;] = df1[&#39;Comments&#39;].str.extract(r&#39;DOB (\d{1,2} \w+ \d{4})&#39;)
df1[&#39;POB&#39;] = df1[&#39;Comments&#39;].str.extract(r&#39;POB ([a-zA-Z\- .]+)&#39;)

答案2

得分: 1

你可以使用str.extract()来完成这个操作：

df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)&')

输出：

    Name                Comments                    DOB          POB
0   Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.  18 May 1967  Mexico.
1   Jose Hiero          POB Mexico.                  NaN         Mexico.
2   Martin Vasquez      -0-                          NaN         NaN

英文:

You can do this with str.extract():

df1[&#39;DOB&#39;] = df1[&#39;Comments&#39;].str.extract(&#39;DOB (.*);+&#39;)
df1[&#39;POB&#39;] = df1[&#39;Comments&#39;].str.extract(&#39;POB (.*)&#39;)

Output:

	Name	            Comments	                    DOB	          POB
0	Juan Pablo Montoya	DOB 18 May 1967; POB Mexico.	18 May 1967	  Mexico.
1	Jose Hiero	        POB Mexico.	                    NaN	          Mexico.
2	Martin Vasquez	    -0-	                            NaN	          NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从 pandas 数据帧列内容中创建新的 pandas 列

问题

答案1

答案2

DataFrame 操作在循环中非常低效，不知道如何修复它。

同时按照分组变量重新排序矩阵的行和列。

根据另一列中的分组和条件填充列。

使用ifelse改变我的计算结果。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论