英文:
python code to extract a record from a data frame from excel based on condition and create and input as column value
问题
我正在尝试根据条件从dataframe
中提取记录并将其转换为一列。我已经尝试了下面的代码:
import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")
def add_action(row):
if row["FILE TYPE"] == 'NaN':
return row['Row Labels']
df = df.assign(Label_header=df.apply(add_action, axis=1))
输入数据帧
| FILE TYPE | Row Labels | TOTAL COUNT |
0 | NaN | CHANGE | 668 |
1 | 87I | ei.cfc.h87.n5.20211 | 98 |
2 | 87P | ep.cfc.m87.n5.2023 | 570 |
3 | NaN | CHN | 5642 |
4 | 87P | NMMCMS.AC.2021_1_R.txt | 1 |
5 | 87P | NS.AC.201_1_R.txt | 1 |
预期输出数据帧
| FILE TYPE | Row Labels | TOTAL COUNT | Label_header |
0 | 87I | ei.cfc.h87.n5.20211 | 98 | CHANGE |
1 | 87P | ep.cfc.m87.n5.2023 | 570 | CHANGE |
2 | 87P | NMMCMS.AC.2021_1_R.txt | 1 | CHN |
3 | 87P | NS.AC.201_1_R.txt | 1 | CHN |
如果您需要进一步的帮助,请随时告诉我。
英文:
I am trying to extract a record from the dataframe
based on condition and convert it into a column. I have tried to work on the below code:
import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")
def add_action(row):
if row["FILE TYPE"] == 'NaN':
return row['Row Labels']
df = df.assign(Label_header=df.apply(add_action, axis=1))
Input dataframe
|FILE TYPE |Row Labels |TOTAL COUNT|
0 | NaN | CHANGE | 668 |
1 | 87I | ei.cfc.h87.n5.20211 | 98 |
2 | 87P | ep.cfc.m87.n5.2023 | 570 |
3 | NaN | CHN | 5642 |
4 | 87P | NMMCMS.AC.2021_1_R.txt | 1 |
5 | 87P | NS.AC.201_1_R.txt | 1 |
Expected output dataframe
|FILE TYPE |Row Labels |TOTAL COUNT| Label_header |
0 | 87I | ei.cfc.h87.n5.20211 | 98 | CHANGE
1 | 87P | ep.cfc.m87.n5.2023 | 570 | CHANGE
2 | 87P | NMMCMS.AC.2021_1_R.txt | 1 | CHN
3 | 87P | NS.AC.201_1_R.txt | 1 | CHN
答案1
得分: 1
使用Series.isna
来测试缺失值,结合Series.where
和Series.ffill
来通过掩码重复值Row Labels
,最后使用boolean indexing
(布尔索引)与通过~
反转的掩码:
m = df["FILE TYPE"].isna()
df['Label_header'] = df['Row Labels'].where(m).ffill()
out = df[~m].reset_index(drop=True)
print(out)
FILE TYPE Row Labels TOTAL COUNT Label_header
0 87I ei.cfc.h87.n5.20211 98 CHANGE
1 87P ep.cfc.m87.n5.2023 570 CHANGE
2 87P NMMCMS.AC.2021_1_R.txt 1 CHN
3 87P NS.AC.201_1_R.txt 1 CHN
m = df["FILE TYPE"].isna()
out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print(out)
FILE TYPE Row Labels TOTAL COUNT Label_header
0 87I ei.cfc.h87.n5.20211 98 CHANGE
1 87P ep.cfc.m87.n5.2023 570 CHANGE
2 87P NMMCMS.AC.2021_1_R.txt 1 CHN
3 87P NS.AC.201_1_R.txt 1 CHN
英文:
Use Series.isna
for test missing values with Series.where
and Series.ffill
for repeat values of Row Labels
by mask, last use boolean indexing
with inverted mask by ~
:
m = df["FILE TYPE"].isna()
df['Label_header'] = df['Row Labels'].where(m).ffill()
out = df[~m].reset_index(drop=True)
print (out)
FILE TYPE Row Labels TOTAL COUNT Label_header
0 87I ei.cfc.h87.n5.20211 98 CHANGE
1 87P ep.cfc.m87.n5.2023 570 CHANGE
2 87P NMMCMS.AC.2021_1_R.txt 1 CHN
3 87P NS.AC.201_1_R.txt 1 CHN
m = df["FILE TYPE"].isna()
out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print (out)
FILE TYPE Row Labels TOTAL COUNT Label_header
0 87I ei.cfc.h87.n5.20211 98 CHANGE
1 87P ep.cfc.m87.n5.2023 570 CHANGE
2 87P NMMCMS.AC.2021_1_R.txt 1 CHN
3 87P NS.AC.201_1_R.txt 1 CHN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论