2023年3月21日 00:50:21go评论62阅读模式

英文:

Pandas split a column in a DF based on a condition and write back to the column

问题

I need to split a column in a dataframe by '-', keep the last part if '-' exists or keep the original input including Nan and write back to the dataframe. This column may contain '-' and 'np.NaN', how do I achieve the goal?

code:

import pandas as pd
import numpy as np

# Sample df
data = [[1, '1010'], [2, 'CORP-1030'], [3, 'LLC-1020'], [4, np.NaN], [5, '1040']]
df = pd.DataFrame(data, columns=['ID', 'Sector'])

   ID     Sector
0   1       1010
1   2  CORP-1030
2   3   LLC-1020
3   4        NaN
4   5       1040

This is what I came up with, NOT clean and readable, looking for a better solution.

df[['Sector Name', 'Sector Code']] = df['Sector'].apply(lambda x: pd.Series(str(x).split("-"))
df.drop(['Sector Name'], errors='ignore', inplace=True, axis=1)
df['Sector Code'].fillna(df['Sector'], inplace=True)
df.drop('Sector', inplace=True, axis=1)

Desired output:

   ID      Sector
0   1        1010
1   2        1030
2   3        1020
3   4         NaN
4   5        1040

英文:

import pandas as pd
import numpy as np

#sample df
data = [[1, &#39;1010&#39;], [2, &#39;CORP-1030&#39;], [3, &#39;LLC-1020&#39;], [4, np.NaN],[5, &#39;1040&#39;]]
df = pd.DataFrame(data, columns=[&#39;ID&#39;, &#39;Sector&#39;])

   ID     Sector
0   1       1010
1   2  CORP-1030
2   3   LLC-1020
3   4        NaN
4   5       1040

This is what I came up with, NOT clean and readable, looking for a better solution.

df[[&#39;Sector Name&#39;, &#39;Sector Code&#39;]] = df[&#39;Sector&#39;].apply(lambda x: pd.Series(str(x).split(&quot;-&quot;)))
df.drop([&#39;Sector Name&#39;], errors=&#39;ignore&#39;, inplace=True, axis=1)
df[&#39;Sector Code&#39;].fillna(df[&#39;Sector&#39;], inplace=True)
df.drop(&#39;Sector&#39;, inplace=True, axis=1)

Desired output:

   ID      Sector
0   1        1010
1   2        1030
2   3        1020
3   4         NaN
4   5        1040

答案1

得分: 1

以下是翻译好的部分：

"That's fairly simple with str accessor" 可以使用 str 访问器来实现这个相当简单。

"df['new'] = df['Sector'].str.split('-').str[-1]" df['new'] = df['Sector'].str.split('-').str[-1]

"Result" 结果

ID Sector new
0 1 1010 1010
1 2 CORP-1030 1030
2 3 LLC-1020 1020
3 4 NaN NaN
4 5 1040 1040

英文:

That's fairly simple with str accessor

df[&#39;new&#39;] = df[&#39;Sector&#39;].str.split(&#39;-&#39;).str[-1]

Result

   ID     Sector   new
0   1       1010  1010
1   2  CORP-1030  1030
2   3   LLC-1020  1020
3   4        NaN   NaN
4   5       1040  1040

答案2

得分: 0

df['Sector'] = df['Sector'].str.replace(r'^[^-]+-', '', regex=True)

   ID Sector
0   1   1010
1   2   1030
2   3   1020
3   4    NaN
4   5   1040

英文:

With simple regex replacement to substitute a leading part followed by -:

df[&#39;Sector&#39;] = df[&#39;Sector&#39;].str.replace(r&#39;^[^-]+-&#39;, &#39;&#39;, regex=True)

   ID Sector
0   1   1010
1   2   1030
2   3   1020
3   4    NaN
4   5   1040

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas根据条件拆分DataFrame列，并写回该列。

问题

答案1

答案2

Aggregation on set of columns in Dataframe using Spark and Scala (get max non-null element of each column using selectExpr)

Transforming, kg ml, l, proportion into g proportion.

OSINT Instagram 工具 – Terra – 用法

如何改进我用于在时间序列中分类间歇信号的Python代码？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论