从多个列和列标题中更新列的 Pandas 操作

huangapple go评论63阅读模式
英文:

Column Update from multiple columns and column headers in Pandas

问题

我有一个特定的问题(不确定对于这里的专业人士和专家来说是否很具挑战性,但对我来说似乎相当艰巨),需要根据列标题中的一些条件和列中的值来更新列。提供输入DataFrame的一些特定行作为示例:

   df-in
       
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     0     3     5     9
		y     b     2      Nan    Nan   Nan   7     11
        z     c     2      0      Nan   0     3     3
        w     d     2      0	  0     0     Nan   8
        u     q     2      0	  0     0     Nan   0

所以现在,基于列'ind'的值,我需要检查列Mx(其中x可以是1,2,3,4,5)。在上面的示例中,由于'ind'列中的所有值都是2,我需要检查M2P列及以上(我不关心M1列,但是如果'ind'是1,我必须检查M1列)。现在在这个示例中,如果M2P列是0,nan或空白,它会从M3P获取值,如果M3P也是空白,0或null,则会从M4P获取值。如果M4P也是空白,null或0,它会从M5P获取值,但是如果M5P值为空白/0/nan,则M2P列的值保持不变(如果'ind'为1,2,3或5,则需要创建相同的逻辑,即如果'ind'为5,则它不会查看其他地方)

所以上面的输出应该是:

    df-out    
	
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     3     3     5     9
		y     b     2      Nan    7    Nan    7     11
        z     c     2      0      3     0     3     3
        w     d     2      0	  8     0     Nan   8
		u     q     2      0	  0     0     Nan   0

我仍然在努力想出在pandas中解决这个问题的最佳方法。目前还不明白。对于任何帮助/代码和想法,将不胜感激。

英文:

I have a specific problem ( not sure if its very challenging to the pros and experts here, but it seems pretty formidable to me) to fix by updating a column based on some conditions in a column headers and values in a column:

I am providing some specific rows of the input Dataframe as an example:

   df-in
       
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     0     3     5     9
		y     b     2      Nan    Nan   Nan   7     11
        z     c     2      0      Nan   0     3     3
        w     d     2      0	  0     0     Nan   8
        u     q     2      0	  0     0     Nan   0

So now, based on the value of the column 'ind' I need to check the column Mx ( where x can be 1,2,3,4,5). In the above example since all values in ind column are 2, I need to check M2P column and above ( I do not care about M1 column, However if ind was 1 I had to check M1 column). Now in this example if M2P column is 0, nan or blank, it gets the value from M3P, if M3P is also blank, 0, or null, it takes value from M4P. If M4P is also blank, null or 0, it gets the value from M5P, however if M5P value is blank/0/nan, then the value in M2P remains as it is ( the same logic needs to be created if ind is 1,2,3, or 5, that is if ind is 5, then it does not look anywhere else)

So the output of the above should be:

    df-out    
	
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     3     3     5     9
		y     b     2      Nan    7    Nan    7     11
        z     c     2      0      3     0     3     3
        w     d     2      0	  8     0     Nan   8
		u     q     2      0	  0     0     Nan   0

I am still struggling to figure what should be the best way to attack this problem in pandas. Not able to understand yet. Any help/codes and ideas will be immensely appreciated.

答案1

得分: 1

# 获取包含 M + 数字 + P 模式的 DataFrame
df1 = df.filter(regex='M\d+P')
# 提取数字并转换为整数
cols = df1.columns.str.extract('(\d+)', expand=False).astype(int)

# 根据 'ind' 列进行比较
m = cols.to_numpy() == df['ind'].to_numpy()[:, None]

# 更新填充后的缺失值 - 仅填充被遮盖的行
df.update(df1.replace(['Nan', 0, '0'], np.nan).bfill(axis=1).where(m))
print(df)
   A  B  ind  M1P  M2P  M3P  M4P  M5P
0  x  a    2  NaN    3.0    3.0    5.0    9.0
1  y  b    2  NaN    7.0    7.0   11.0   11.0
2  z  c    2  NaN    3.0    3.0    3.0    3.0
3  w  d    2  NaN    8.0    8.0    8.0    8.0
4  u  q    2  NaN    NaN    NaN    NaN    0.0
英文:

Use:

#get DataFrame with M + number + P pattern
df1 = df.filter(regex='M\d+P')
#extract numbers and convert to integers
cols = df1.columns.str.extract('(\d+)', expand=False).astype(int)

#compare by ind column
m = cols.to_numpy() == df['ind'].to_numpy()[:, None]

#update back filled missing values - only masked rows 
df.update(df1.replace(['Nan', 0, '0'], np.nan).bfill(axis=1).where(m))
print (df)
   A  B  ind  M1P  M2P  M3P  M4P  M5P
0  x  a    2    0    3    3    5    9
1  y  b    2  Nan    7  Nan    7   11
2  z  c    2    0    3    0    3    3
3  w  d    2    0  8.0    0  Nan    8
4  u  q    2    0    0    0  Nan    0

huangapple
  • 本文由 发表于 2023年6月22日 14:37:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76529155.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定