在Pandas中填充不同数据框列切片中的NA值。

huangapple go评论72阅读模式
英文:

Fill NA values over varied data frame column slices in Pandas

问题

以下是代码的翻译部分:

我有一个类似于以下的Pandas数据框架

pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,np.nan],
'2022-02' : [np.nan,3,4],
'2022-03' : [np.nan,1,3],
'2022-04' : [np.nan,np.nan,2],
'2022-05' : [np.nan,np.nan,np.nan],
'2022-06' : [np.nan,np.nan,np.nan]
})


我想要填充每行中的NaN值,使得在`End`列中列出的列之前的所有列都被替换为0,而之后的列保持为NaN。

所期望的输出是:

pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,0],
'2022-02' : [0,3,4],
'2022-03' : [0,1,3],
'2022-04' : [np.nan,0,2],
'2022-05' : [np.nan,0,0],
'2022-06' : [np.nan,np.nan,0]
})


<details>
<summary>英文:</summary>

I have a Pandas data frame similar to the following:
```{}
pd.DataFrame({
    &#39;End&#39; : [&#39;2022-03&#39;,&#39;2022-05&#39;,&#39;2022-06&#39;],
    &#39;2022-01&#39; : [1,2,np.nan],
    &#39;2022-02&#39; : [np.nan,3,4],
    &#39;2022-03&#39; : [np.nan,1,3],
    &#39;2022-04&#39; : [np.nan,np.nan,2],
    &#39;2022-05&#39; : [np.nan,np.nan,np.nan],
    &#39;2022-06&#39; : [np.nan,np.nan,np.nan]
})

I would like to fill the NaN values in each row such that all columns up to that listed in end are replaced with 0 while those after remain as NaN

The desired output would be:

pd.DataFrame({
    &#39;End&#39; : [&#39;2022-03&#39;,&#39;2022-05&#39;,&#39;2022-06&#39;],
    &#39;2022-01&#39; : [1,2,0],
    &#39;2022-02&#39; : [0,3,4],
    &#39;2022-03&#39; : [0,1,3],
    &#39;2022-04&#39; : [np.nan,0,2],
    &#39;2022-05&#39; : [np.nan,0,0],
    &#39;2022-06&#39; : [np.nan,np.nan,0]
})

答案1

得分: 2

使用广播操作来比较月份,然后可以使用where进行掩码处理:

df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df['End'].to_numpy()[:,None] >= [df.columns[1:]])

或者在您的其他数据不是NaN时更安全的方式:

df.iloc[:,1:] = np.where(df['End'].to_numpy()[:,None] >= [df.columns[1:]],
                         df.iloc[:,1:].fillna(0), df.iloc[:,1:])

输出:

           End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
    0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
    1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
    2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

注意:将End设置为索引可能会更好。

英文:

Use broadcasting to compare the months, then you can mask with where:

df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df[&#39;End&#39;].to_numpy()[:,None] &gt;= [df.columns[1:]])

Or safer when your other data is not NaN:

df.iloc[:,1:] = np.where(df[&#39;End&#39;].to_numpy()[:,None] &gt;= [df.columns[1:]],
                         df.iloc[:,1:].fillna(0), df.iloc[:,1:])

Output:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

Note: It might be better setting End as the index.

答案2

得分: 2

使用numpy广播功能对索引/列进行操作,使用 maskfillna 方法:

mask = df['End'].to_numpy()[:, None] >= df.columns.to_numpy()

out = df.fillna(df.mask(mask, 0))

print(out)

输出:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

中间的 mask

array([[ True,  True, False, False, False, False],
       [ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True,  True, False]])
英文:

Use numpy broadcasting on the index/columns with mask and fillna:

mask = df[&#39;End&#39;].to_numpy()[:, None] &gt;= df.columns.to_numpy()

out = df.fillna(df.mask(mask, 0))

print(out)

Output:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

Intermediate mask:

array([[ True,  True, False, False, False, False],
       [ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True,  True, False]])

答案3

得分: 2

以下是翻译好的代码部分:

也许不是最优雅的解决方案但可以使用 `pd.melt``pd.pivot` 完成

melt_df = df.melt(id_vars=["End"])
melt_df.loc[(melt_df["End"] >= melt_df["variable"]) & (melt_df["value"].isnull()), "value"] = 0

这使得检查条件更加容易。然后,您可以恢复到原始的数据框格式:

final_df = melt_df.pivot(index="End", columns="variable", values="value").reset_index()
final_df.columns.name = None

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

希望对您有帮助。

英文:

Probably not the most elegant solution but can be done using pd.melt and pd.pivot:

melt_df = df.melt(id_vars=[&quot;End&quot;])
melt_df.loc[(melt_df[&quot;End&quot;] &gt;= melt_df[&quot;variable&quot;]) &amp; (melt_df[&quot;value&quot;].isnull()), &quot;value&quot;] = 0

This makes checking your condition easier. Then you reverse back to get original df format:

final_df = melt_df.pivot(index=&quot;End&quot;, columns=&quot;variable&quot;, values=&quot;value&quot;).reset_index()
final_df.columns.name = None

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

huangapple
  • 本文由 发表于 2023年2月9日 00:14:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388644.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定