英文:
Pandas split a column in a DF based on a condition and write back to the column
问题
I need to split a column in a dataframe by '-', keep the last part if '-' exists or keep the original input including Nan and write back to the dataframe. This column may contain '-' and 'np.NaN', how do I achieve the goal?
code:
import pandas as pd
import numpy as np
# Sample df
data = [[1, '1010'], [2, 'CORP-1030'], [3, 'LLC-1020'], [4, np.NaN], [5, '1040']]
df = pd.DataFrame(data, columns=['ID', 'Sector'])
ID Sector
0 1 1010
1 2 CORP-1030
2 3 LLC-1020
3 4 NaN
4 5 1040
This is what I came up with, NOT clean and readable, looking for a better solution.
df[['Sector Name', 'Sector Code']] = df['Sector'].apply(lambda x: pd.Series(str(x).split("-"))
df.drop(['Sector Name'], errors='ignore', inplace=True, axis=1)
df['Sector Code'].fillna(df['Sector'], inplace=True)
df.drop('Sector', inplace=True, axis=1)
Desired output:
ID Sector
0 1 1010
1 2 1030
2 3 1020
3 4 NaN
4 5 1040
英文:
I need to split a column in a dataframe by '-', keep the last part if '-' exists or keep the original input including Nan and write back to the dataframe. This column may contain '-' and 'np.NaN', how do I achieve the goal?
code:
import pandas as pd
import numpy as np
#sample df
data = [[1, '1010'], [2, 'CORP-1030'], [3, 'LLC-1020'], [4, np.NaN],[5, '1040']]
df = pd.DataFrame(data, columns=['ID', 'Sector'])
ID Sector
0 1 1010
1 2 CORP-1030
2 3 LLC-1020
3 4 NaN
4 5 1040
This is what I came up with, NOT clean and readable, looking for a better solution.
df[['Sector Name', 'Sector Code']] = df['Sector'].apply(lambda x: pd.Series(str(x).split("-")))
df.drop(['Sector Name'], errors='ignore', inplace=True, axis=1)
df['Sector Code'].fillna(df['Sector'], inplace=True)
df.drop('Sector', inplace=True, axis=1)
Desired output:
ID Sector
0 1 1010
1 2 1030
2 3 1020
3 4 NaN
4 5 1040
答案1
得分: 1
以下是翻译好的部分:
"That's fairly simple with str
accessor" 可以使用 str
访问器来实现这个相当简单。
"df['new'] = df['Sector'].str.split('-').str[-1]" df['new'] = df['Sector'].str.split('-').str[-1]
"Result" 结果
ID Sector new
0 1 1010 1010
1 2 CORP-1030 1030
2 3 LLC-1020 1020
3 4 NaN NaN
4 5 1040 1040
英文:
That's fairly simple with str
accessor
df['new'] = df['Sector'].str.split('-').str[-1]
Result
ID Sector new
0 1 1010 1010
1 2 CORP-1030 1030
2 3 LLC-1020 1020
3 4 NaN NaN
4 5 1040 1040
答案2
得分: 0
df['Sector'] = df['Sector'].str.replace(r'^[^-]+-', '', regex=True)
ID Sector
0 1 1010
1 2 1030
2 3 1020
3 4 NaN
4 5 1040
英文:
With simple regex replacement to substitute a leading part followed by -
:
df['Sector'] = df['Sector'].str.replace(r'^[^-]+-', '', regex=True)
ID Sector
0 1 1010
1 2 1030
2 3 1020
3 4 NaN
4 5 1040
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论