英文:
Column Update from multiple columns and column headers in Pandas
问题
我有一个特定的问题(不确定对于这里的专业人士和专家来说是否很具挑战性,但对我来说似乎相当艰巨),需要根据列标题中的一些条件和列中的值来更新列。提供输入DataFrame的一些特定行作为示例:
df-in
A B ind M1P M2P M3P M4P M5P
x a 2 0 0 3 5 9
y b 2 Nan Nan Nan 7 11
z c 2 0 Nan 0 3 3
w d 2 0 0 0 Nan 8
u q 2 0 0 0 Nan 0
所以现在,基于列'ind'的值,我需要检查列Mx(其中x可以是1,2,3,4,5)。在上面的示例中,由于'ind'列中的所有值都是2,我需要检查M2P列及以上(我不关心M1列,但是如果'ind'是1,我必须检查M1列)。现在在这个示例中,如果M2P列是0,nan或空白,它会从M3P获取值,如果M3P也是空白,0或null,则会从M4P获取值。如果M4P也是空白,null或0,它会从M5P获取值,但是如果M5P值为空白/0/nan,则M2P列的值保持不变(如果'ind'为1,2,3或5,则需要创建相同的逻辑,即如果'ind'为5,则它不会查看其他地方)
所以上面的输出应该是:
df-out
A B ind M1P M2P M3P M4P M5P
x a 2 0 3 3 5 9
y b 2 Nan 7 Nan 7 11
z c 2 0 3 0 3 3
w d 2 0 8 0 Nan 8
u q 2 0 0 0 Nan 0
我仍然在努力想出在pandas中解决这个问题的最佳方法。目前还不明白。对于任何帮助/代码和想法,将不胜感激。
英文:
I have a specific problem ( not sure if its very challenging to the pros and experts here, but it seems pretty formidable to me) to fix by updating a column based on some conditions in a column headers and values in a column:
I am providing some specific rows of the input Dataframe as an example:
df-in
A B ind M1P M2P M3P M4P M5P
x a 2 0 0 3 5 9
y b 2 Nan Nan Nan 7 11
z c 2 0 Nan 0 3 3
w d 2 0 0 0 Nan 8
u q 2 0 0 0 Nan 0
So now, based on the value of the column 'ind' I need to check the column Mx ( where x can be 1,2,3,4,5). In the above example since all values in ind column are 2, I need to check M2P column and above ( I do not care about M1 column, However if ind was 1 I had to check M1 column). Now in this example if M2P column is 0, nan or blank, it gets the value from M3P, if M3P is also blank, 0, or null, it takes value from M4P. If M4P is also blank, null or 0, it gets the value from M5P, however if M5P value is blank/0/nan, then the value in M2P remains as it is ( the same logic needs to be created if ind is 1,2,3, or 5, that is if ind is 5, then it does not look anywhere else)
So the output of the above should be:
df-out
A B ind M1P M2P M3P M4P M5P
x a 2 0 3 3 5 9
y b 2 Nan 7 Nan 7 11
z c 2 0 3 0 3 3
w d 2 0 8 0 Nan 8
u q 2 0 0 0 Nan 0
I am still struggling to figure what should be the best way to attack this problem in pandas. Not able to understand yet. Any help/codes and ideas will be immensely appreciated.
答案1
得分: 1
# 获取包含 M + 数字 + P 模式的 DataFrame
df1 = df.filter(regex='M\d+P')
# 提取数字并转换为整数
cols = df1.columns.str.extract('(\d+)', expand=False).astype(int)
# 根据 'ind' 列进行比较
m = cols.to_numpy() == df['ind'].to_numpy()[:, None]
# 更新填充后的缺失值 - 仅填充被遮盖的行
df.update(df1.replace(['Nan', 0, '0'], np.nan).bfill(axis=1).where(m))
print(df)
A B ind M1P M2P M3P M4P M5P
0 x a 2 NaN 3.0 3.0 5.0 9.0
1 y b 2 NaN 7.0 7.0 11.0 11.0
2 z c 2 NaN 3.0 3.0 3.0 3.0
3 w d 2 NaN 8.0 8.0 8.0 8.0
4 u q 2 NaN NaN NaN NaN 0.0
英文:
Use:
#get DataFrame with M + number + P pattern
df1 = df.filter(regex='M\d+P')
#extract numbers and convert to integers
cols = df1.columns.str.extract('(\d+)', expand=False).astype(int)
#compare by ind column
m = cols.to_numpy() == df['ind'].to_numpy()[:, None]
#update back filled missing values - only masked rows
df.update(df1.replace(['Nan', 0, '0'], np.nan).bfill(axis=1).where(m))
print (df)
A B ind M1P M2P M3P M4P M5P
0 x a 2 0 3 3 5 9
1 y b 2 Nan 7 Nan 7 11
2 z c 2 0 3 0 3 3
3 w d 2 0 8.0 0 Nan 8
4 u q 2 0 0 0 Nan 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论