python code to extract a record from a data frame from excel based on condition and create and input as column value

huangapple go评论67阅读模式
英文:

python code to extract a record from a data frame from excel based on condition and create and input as column value

问题

我正在尝试根据条件从dataframe中提取记录并将其转换为一列。我已经尝试了下面的代码:

import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")

def add_action(row):
    if row["FILE TYPE"] == 'NaN':
        return row['Row Labels']

df = df.assign(Label_header=df.apply(add_action, axis=1))

输入数据帧

| FILE TYPE | Row Labels             | TOTAL COUNT |
0 | NaN      | CHANGE                 | 668         |
1 | 87I      | ei.cfc.h87.n5.20211    | 98          |
2 | 87P      | ep.cfc.m87.n5.2023     | 570         |
3 | NaN      | CHN                    | 5642        |
4 | 87P      | NMMCMS.AC.2021_1_R.txt | 1           |
5 | 87P      | NS.AC.201_1_R.txt      | 1           |

预期输出数据帧

| FILE TYPE | Row Labels             | TOTAL COUNT | Label_header |
0 | 87I      | ei.cfc.h87.n5.20211    | 98          | CHANGE       |
1 | 87P      | ep.cfc.m87.n5.2023     | 570         | CHANGE       |
2 | 87P      | NMMCMS.AC.2021_1_R.txt | 1           | CHN          |
3 | 87P      | NS.AC.201_1_R.txt      | 1           | CHN          |

如果您需要进一步的帮助,请随时告诉我。

英文:

I am trying to extract a record from the dataframe based on condition and convert it into a column. I have tried to work on the below code:

import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")

def add_action(row): 
    if row["FILE TYPE"] == 'NaN': 
        return row['Row Labels']   

df = df.assign(Label_header=df.apply(add_action, axis=1)) 

Input dataframe

       |FILE TYPE |Row Labels               |TOTAL COUNT|	 

0      | NaN      | CHANGE                  | 668   |
1      | 87I      | ei.cfc.h87.n5.20211     | 98 	|	
2      | 87P      | ep.cfc.m87.n5.2023      | 570 	|	
3      | NaN      | CHN                     | 5642	|	
4      | 87P      | NMMCMS.AC.2021_1_R.txt  | 1 	|	
5      | 87P      | NS.AC.201_1_R.txt       | 1	    |

Expected output dataframe

       |FILE TYPE |Row Labels               |TOTAL COUNT|	 Label_header |

0      | 87I      | ei.cfc.h87.n5.20211     | 98 	|	     CHANGE
1      | 87P      | ep.cfc.m87.n5.2023      | 570 	|	     CHANGE
2      | 87P      | NMMCMS.AC.2021_1_R.txt  | 1 	|	     CHN
3      | 87P      | NS.AC.201_1_R.txt       | 1	    |        CHN

答案1

得分: 1

使用Series.isna来测试缺失值,结合Series.whereSeries.ffill来通过掩码重复值Row Labels,最后使用boolean indexing(布尔索引)与通过~反转的掩码:

m = df["FILE TYPE"].isna()

df['Label_header'] = df['Row Labels'].where(m).ffill()

out = df[~m].reset_index(drop=True)
print(out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN
m = df["FILE TYPE"].isna()

out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print(out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN
英文:

Use Series.isna for test missing values with Series.where and Series.ffill for repeat values of Row Labels by mask, last use boolean indexing with inverted mask by ~:

m = df["FILE TYPE"].isna()

df['Label_header'] = df['Row Labels'].where(m).ffill()

out = df[~m].reset_index(drop=True)
print (out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

m = df["FILE TYPE"].isna()

out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print (out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

huangapple
  • 本文由 发表于 2023年4月11日 13:35:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75982663.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定