英文:
Fill NA values over varied data frame column slices in Pandas
问题
以下是代码的翻译部分:
我有一个类似于以下的Pandas数据框架:
pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,np.nan],
'2022-02' : [np.nan,3,4],
'2022-03' : [np.nan,1,3],
'2022-04' : [np.nan,np.nan,2],
'2022-05' : [np.nan,np.nan,np.nan],
'2022-06' : [np.nan,np.nan,np.nan]
})
我想要填充每行中的NaN值,使得在`End`列中列出的列之前的所有列都被替换为0,而之后的列保持为NaN。
所期望的输出是:
pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,0],
'2022-02' : [0,3,4],
'2022-03' : [0,1,3],
'2022-04' : [np.nan,0,2],
'2022-05' : [np.nan,0,0],
'2022-06' : [np.nan,np.nan,0]
})
<details>
<summary>英文:</summary>
I have a Pandas data frame similar to the following:
```{}
pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,np.nan],
'2022-02' : [np.nan,3,4],
'2022-03' : [np.nan,1,3],
'2022-04' : [np.nan,np.nan,2],
'2022-05' : [np.nan,np.nan,np.nan],
'2022-06' : [np.nan,np.nan,np.nan]
})
I would like to fill the NaN values in each row such that all columns up to that listed in end
are replaced with 0 while those after remain as NaN
The desired output would be:
pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,0],
'2022-02' : [0,3,4],
'2022-03' : [0,1,3],
'2022-04' : [np.nan,0,2],
'2022-05' : [np.nan,0,0],
'2022-06' : [np.nan,np.nan,0]
})
答案1
得分: 2
使用广播操作来比较月份,然后可以使用where
进行掩码处理:
df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df['End'].to_numpy()[:,None] >= [df.columns[1:]])
或者在您的其他数据不是NaN
时更安全的方式:
df.iloc[:,1:] = np.where(df['End'].to_numpy()[:,None] >= [df.columns[1:]],
df.iloc[:,1:].fillna(0), df.iloc[:,1:])
输出:
End 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06
0 2022-03 1.0 0.0 0.0 NaN NaN NaN
1 2022-05 2.0 3.0 1.0 0.0 0.0 NaN
2 2022-06 0.0 4.0 3.0 2.0 0.0 0.0
注意:将End
设置为索引可能会更好。
英文:
Use broadcasting to compare the months, then you can mask with where
:
df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df['End'].to_numpy()[:,None] >= [df.columns[1:]])
Or safer when your other data is not NaN
:
df.iloc[:,1:] = np.where(df['End'].to_numpy()[:,None] >= [df.columns[1:]],
df.iloc[:,1:].fillna(0), df.iloc[:,1:])
Output:
End 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06
0 2022-03 1.0 0.0 0.0 NaN NaN NaN
1 2022-05 2.0 3.0 1.0 0.0 0.0 NaN
2 2022-06 0.0 4.0 3.0 2.0 0.0 0.0
Note: It might be better setting End
as the index.
答案2
得分: 2
使用numpy广播功能对索引/列进行操作,使用 mask
和 fillna
方法:
mask = df['End'].to_numpy()[:, None] >= df.columns.to_numpy()
out = df.fillna(df.mask(mask, 0))
print(out)
输出:
End 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06
0 2022-03 1.0 0.0 0.0 NaN NaN NaN
1 2022-05 2.0 3.0 1.0 0.0 0.0 NaN
2 2022-06 0.0 4.0 3.0 2.0 0.0 0.0
中间的 mask
:
array([[ True, True, False, False, False, False],
[ True, True, True, True, False, False],
[ True, True, True, True, True, False]])
英文:
Use numpy broadcasting on the index/columns with mask
and fillna
:
mask = df['End'].to_numpy()[:, None] >= df.columns.to_numpy()
out = df.fillna(df.mask(mask, 0))
print(out)
Output:
End 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06
0 2022-03 1.0 0.0 0.0 NaN NaN NaN
1 2022-05 2.0 3.0 1.0 0.0 0.0 NaN
2 2022-06 0.0 4.0 3.0 2.0 0.0 0.0
Intermediate mask
:
array([[ True, True, False, False, False, False],
[ True, True, True, True, False, False],
[ True, True, True, True, True, False]])
答案3
得分: 2
以下是翻译好的代码部分:
也许不是最优雅的解决方案,但可以使用 `pd.melt` 和 `pd.pivot` 完成:
melt_df = df.melt(id_vars=["End"])
melt_df.loc[(melt_df["End"] >= melt_df["variable"]) & (melt_df["value"].isnull()), "value"] = 0
这使得检查条件更加容易。然后,您可以恢复到原始的数据框格式:
final_df = melt_df.pivot(index="End", columns="variable", values="value").reset_index()
final_df.columns.name = None
End 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06
0 2022-03 1.0 0.0 0.0 NaN NaN NaN
1 2022-05 2.0 3.0 1.0 0.0 0.0 NaN
2 2022-06 0.0 4.0 3.0 2.0 0.0 0.0
希望对您有帮助。
英文:
Probably not the most elegant solution but can be done using pd.melt
and pd.pivot
:
melt_df = df.melt(id_vars=["End"])
melt_df.loc[(melt_df["End"] >= melt_df["variable"]) & (melt_df["value"].isnull()), "value"] = 0
This makes checking your condition easier. Then you reverse back to get original df format:
final_df = melt_df.pivot(index="End", columns="variable", values="value").reset_index()
final_df.columns.name = None
End 2022-01 2022-02 2022-03 2022-04 2022-05 2022-06
0 2022-03 1.0 0.0 0.0 NaN NaN NaN
1 2022-05 2.0 3.0 1.0 0.0 0.0 NaN
2 2022-06 0.0 4.0 3.0 2.0 0.0 0.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论