2023年4月11日 13:35:05go评论84阅读模式

英文:

python code to extract a record from a data frame from excel based on condition and create and input as column value

问题

我正在尝试根据条件从dataframe中提取记录并将其转换为一列。我已经尝试了下面的代码：

import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")

def add_action(row):
    if row["FILE TYPE"] == 'NaN':
        return row['Row Labels']

df = df.assign(Label_header=df.apply(add_action, axis=1))

输入数据帧

| FILE TYPE | Row Labels             | TOTAL COUNT |
0 | NaN      | CHANGE                 | 668         |
1 | 87I      | ei.cfc.h87.n5.20211    | 98          |
2 | 87P      | ep.cfc.m87.n5.2023     | 570         |
3 | NaN      | CHN                    | 5642        |
4 | 87P      | NMMCMS.AC.2021_1_R.txt | 1           |
5 | 87P      | NS.AC.201_1_R.txt      | 1           |

预期输出数据帧

| FILE TYPE | Row Labels             | TOTAL COUNT | Label_header |
0 | 87I      | ei.cfc.h87.n5.20211    | 98          | CHANGE       |
1 | 87P      | ep.cfc.m87.n5.2023     | 570         | CHANGE       |
2 | 87P      | NMMCMS.AC.2021_1_R.txt | 1           | CHN          |
3 | 87P      | NS.AC.201_1_R.txt      | 1           | CHN          |

如果您需要进一步的帮助，请随时告诉我。

英文:

I am trying to extract a record from the dataframe based on condition and convert it into a column. I have tried to work on the below code:

import pandas as pd
df = pd.read_excel(&quot;C:/Users/0403 - Copy (3).xlsx&quot;)

def add_action(row): 
    if row[&quot;FILE TYPE&quot;] == &#39;NaN&#39;: 
        return row[&#39;Row Labels&#39;]   

df = df.assign(Label_header=df.apply(add_action, axis=1))

Input dataframe

       |FILE TYPE |Row Labels               |TOTAL COUNT|	 

0      | NaN      | CHANGE                  | 668   |
1      | 87I      | ei.cfc.h87.n5.20211     | 98 	|	
2      | 87P      | ep.cfc.m87.n5.2023      | 570 	|	
3      | NaN      | CHN                     | 5642	|	
4      | 87P      | NMMCMS.AC.2021_1_R.txt  | 1 	|	
5      | 87P      | NS.AC.201_1_R.txt       | 1	    |

Expected output dataframe

       |FILE TYPE |Row Labels               |TOTAL COUNT|	 Label_header |

0      | 87I      | ei.cfc.h87.n5.20211     | 98 	|	     CHANGE
1      | 87P      | ep.cfc.m87.n5.2023      | 570 	|	     CHANGE
2      | 87P      | NMMCMS.AC.2021_1_R.txt  | 1 	|	     CHN
3      | 87P      | NS.AC.201_1_R.txt       | 1	    |        CHN

答案1

得分: 1

使用Series.isna来测试缺失值，结合Series.where和Series.ffill来通过掩码重复值Row Labels，最后使用boolean indexing（布尔索引）与通过~反转的掩码：

m = df["FILE TYPE"].isna()

df['Label_header'] = df['Row Labels'].where(m).ffill()

out = df[~m].reset_index(drop=True)
print(out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

m = df["FILE TYPE"].isna()

out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print(out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

英文:

Use Series.isna for test missing values with Series.where and Series.ffill for repeat values of Row Labels by mask, last use boolean indexing with inverted mask by ~:

m = df[&quot;FILE TYPE&quot;].isna()

df[&#39;Label_header&#39;] = df[&#39;Row Labels&#39;].where(m).ffill()

out = df[~m].reset_index(drop=True)
print (out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

m = df[&quot;FILE TYPE&quot;].isna()

out = df.assign(Label_header=df[&#39;Row Labels&#39;].where(m).ffill())[~m].reset_index(drop=True)
print (out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

python code to extract a record from a data frame from excel based on condition and create and input as column value

问题

答案1

在Linux服务上无法导入pandas库。

从数据框的每个组/ID中从底部删除行。

安装pyodbc时出现错误，无法为pyodbc构建轮子。

从另一个文件中的函数内调用并发 futures。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论