2023年6月22日 14:37:29go评论92阅读模式

英文:

Column Update from multiple columns and column headers in Pandas

问题

我有一个特定的问题（不确定对于这里的专业人士和专家来说是否很具挑战性，但对我来说似乎相当艰巨），需要根据列标题中的一些条件和列中的值来更新列。提供输入DataFrame的一些特定行作为示例：

   df-in
       
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     0     3     5     9
		y     b     2      Nan    Nan   Nan   7     11
        z     c     2      0      Nan   0     3     3
        w     d     2      0	  0     0     Nan   8
        u     q     2      0	  0     0     Nan   0

所以现在，基于列'ind'的值，我需要检查列Mx（其中x可以是1,2,3,4,5）。在上面的示例中，由于'ind'列中的所有值都是2，我需要检查M2P列及以上（我不关心M1列，但是如果'ind'是1，我必须检查M1列）。现在在这个示例中，如果M2P列是0，nan或空白，它会从M3P获取值，如果M3P也是空白，0或null，则会从M4P获取值。如果M4P也是空白，null或0，它会从M5P获取值，但是如果M5P值为空白/0/nan，则M2P列的值保持不变（如果'ind'为1,2,3或5，则需要创建相同的逻辑，即如果'ind'为5，则它不会查看其他地方）

所以上面的输出应该是：

    df-out    
	
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     3     3     5     9
		y     b     2      Nan    7    Nan    7     11
        z     c     2      0      3     0     3     3
        w     d     2      0	  8     0     Nan   8
		u     q     2      0	  0     0     Nan   0

我仍然在努力想出在pandas中解决这个问题的最佳方法。目前还不明白。对于任何帮助/代码和想法，将不胜感激。

英文:

I have a specific problem ( not sure if its very challenging to the pros and experts here, but it seems pretty formidable to me) to fix by updating a column based on some conditions in a column headers and values in a column:

I am providing some specific rows of the input Dataframe as an example:

   df-in
       
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     0     3     5     9
		y     b     2      Nan    Nan   Nan   7     11
        z     c     2      0      Nan   0     3     3
        w     d     2      0	  0     0     Nan   8
        u     q     2      0	  0     0     Nan   0

So now, based on the value of the column 'ind' I need to check the column Mx ( where x can be 1,2,3,4,5). In the above example since all values in ind column are 2, I need to check M2P column and above ( I do not care about M1 column, However if ind was 1 I had to check M1 column). Now in this example if M2P column is 0, nan or blank, it gets the value from M3P, if M3P is also blank, 0, or null, it takes value from M4P. If M4P is also blank, null or 0, it gets the value from M5P, however if M5P value is blank/0/nan, then the value in M2P remains as it is ( the same logic needs to be created if ind is 1,2,3, or 5, that is if ind is 5, then it does not look anywhere else)

So the output of the above should be:

    df-out    
	
	    A     B    ind     M1P   M2P   M3P   M4P   M5P
		
		x     a     2       0     3     3     5     9
		y     b     2      Nan    7    Nan    7     11
        z     c     2      0      3     0     3     3
        w     d     2      0	  8     0     Nan   8
		u     q     2      0	  0     0     Nan   0

I am still struggling to figure what should be the best way to attack this problem in pandas. Not able to understand yet. Any help/codes and ideas will be immensely appreciated.

答案1

得分: 1

# 获取包含 M + 数字 + P 模式的 DataFrame
df1 = df.filter(regex='M\d+P')
# 提取数字并转换为整数
cols = df1.columns.str.extract('(\d+)', expand=False).astype(int)
# 根据 'ind' 列进行比较
m = cols.to_numpy() == df['ind'].to_numpy()[:, None]
# 更新填充后的缺失值 - 仅填充被遮盖的行
df.update(df1.replace(['Nan', 0, '0'], np.nan).bfill(axis=1).where(m))
print(df)
   A  B  ind  M1P  M2P  M3P  M4P  M5P
0  x  a    2  NaN    3.0    3.0    5.0    9.0
1  y  b    2  NaN    7.0    7.0   11.0   11.0
2  z  c    2  NaN    3.0    3.0    3.0    3.0
3  w  d    2  NaN    8.0    8.0    8.0    8.0
4  u  q    2  NaN    NaN    NaN    NaN    0.0

英文:

Use:

#get DataFrame with M + number + P pattern
df1 = df.filter(regex=&#39;M\d+P&#39;)
#extract numbers and convert to integers
cols = df1.columns.str.extract(&#39;(\d+)&#39;, expand=False).astype(int)
#compare by ind column
m = cols.to_numpy() == df[&#39;ind&#39;].to_numpy()[:, None]
#update back filled missing values - only masked rows 
df.update(df1.replace([&#39;Nan&#39;, 0, &#39;0&#39;], np.nan).bfill(axis=1).where(m))
print (df)
   A  B  ind  M1P  M2P  M3P  M4P  M5P
0  x  a    2    0    3    3    5    9
1  y  b    2  Nan    7  Nan    7   11
2  z  c    2    0    3    0    3    3
3  w  d    2    0  8.0    0  Nan    8
4  u  q    2    0    0    0  Nan    0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从多个列和列标题中更新列的 Pandas 操作

问题

答案1

如何将R包安装到Ubuntu的Docker容器中。

图像处理 – 如何同时增强细节并减少噪音？

Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

快速从int16解析为float32的Python代码。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。