2023年2月9日 00:14:16go评论155阅读模式

英文:

Fill NA values over varied data frame column slices in Pandas

问题

以下是代码的翻译部分：

我有一个类似于以下的Pandas数据框架：

pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,np.nan],
'2022-02' : [np.nan,3,4],
'2022-03' : [np.nan,1,3],
'2022-04' : [np.nan,np.nan,2],
'2022-05' : [np.nan,np.nan,np.nan],
'2022-06' : [np.nan,np.nan,np.nan]
})


我想要填充每行中的NaN值，使得在`End`列中列出的列之前的所有列都被替换为0，而之后的列保持为NaN。

所期望的输出是：

pd.DataFrame({
'End' : ['2022-03','2022-05','2022-06'],
'2022-01' : [1,2,0],
'2022-02' : [0,3,4],
'2022-03' : [0,1,3],
'2022-04' : [np.nan,0,2],
'2022-05' : [np.nan,0,0],
'2022-06' : [np.nan,np.nan,0]
})


<details>
<summary>英文:</summary>

I have a Pandas data frame similar to the following:
```{}
pd.DataFrame({
    &#39;End&#39; : [&#39;2022-03&#39;,&#39;2022-05&#39;,&#39;2022-06&#39;],
    &#39;2022-01&#39; : [1,2,np.nan],
    &#39;2022-02&#39; : [np.nan,3,4],
    &#39;2022-03&#39; : [np.nan,1,3],
    &#39;2022-04&#39; : [np.nan,np.nan,2],
    &#39;2022-05&#39; : [np.nan,np.nan,np.nan],
    &#39;2022-06&#39; : [np.nan,np.nan,np.nan]
})

I would like to fill the NaN values in each row such that all columns up to that listed in end are replaced with 0 while those after remain as NaN

The desired output would be:

pd.DataFrame({
    &#39;End&#39; : [&#39;2022-03&#39;,&#39;2022-05&#39;,&#39;2022-06&#39;],
    &#39;2022-01&#39; : [1,2,0],
    &#39;2022-02&#39; : [0,3,4],
    &#39;2022-03&#39; : [0,1,3],
    &#39;2022-04&#39; : [np.nan,0,2],
    &#39;2022-05&#39; : [np.nan,0,0],
    &#39;2022-06&#39; : [np.nan,np.nan,0]
})

答案1

得分: 2

使用广播操作来比较月份，然后可以使用where进行掩码处理：

df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df['End'].to_numpy()[:,None] >= [df.columns[1:]])

或者在您的其他数据不是NaN时更安全的方式：

df.iloc[:,1:] = np.where(df['End'].to_numpy()[:,None] >= [df.columns[1:]],
                         df.iloc[:,1:].fillna(0), df.iloc[:,1:])

输出：

           End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
    0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
    1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
    2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

注意：将End设置为索引可能会更好。

英文:

Use broadcasting to compare the months, then you can mask with where:

df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df[&#39;End&#39;].to_numpy()[:,None] &gt;= [df.columns[1:]])

Or safer when your other data is not NaN:

df.iloc[:,1:] = np.where(df[&#39;End&#39;].to_numpy()[:,None] &gt;= [df.columns[1:]],
                         df.iloc[:,1:].fillna(0), df.iloc[:,1:])

Output:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

Note: It might be better setting End as the index.

答案2

得分: 2

使用numpy广播功能对索引/列进行操作，使用 mask 和 fillna 方法：

mask = df['End'].to_numpy()[:, None] >= df.columns.to_numpy()

out = df.fillna(df.mask(mask, 0))

print(out)

输出：

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

中间的 mask：

array([[ True,  True, False, False, False, False],
       [ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True,  True, False]])

英文:

Use numpy broadcasting on the index/columns with mask and fillna:

mask = df[&#39;End&#39;].to_numpy()[:, None] &gt;= df.columns.to_numpy()

out = df.fillna(df.mask(mask, 0))

print(out)

Output:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

Intermediate mask:

array([[ True,  True, False, False, False, False],
       [ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True,  True, False]])

答案3

得分: 2

以下是翻译好的代码部分：

也许不是最优雅的解决方案，但可以使用 `pd.melt` 和 `pd.pivot` 完成：

melt_df = df.melt(id_vars=["End"])
melt_df.loc[(melt_df["End"] >= melt_df["variable"]) & (melt_df["value"].isnull()), "value"] = 0

这使得检查条件更加容易。然后，您可以恢复到原始的数据框格式：

final_df = melt_df.pivot(index="End", columns="variable", values="value").reset_index()
final_df.columns.name = None

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

希望对您有帮助。

英文:

Probably not the most elegant solution but can be done using pd.melt and pd.pivot:

melt_df = df.melt(id_vars=[&quot;End&quot;])
melt_df.loc[(melt_df[&quot;End&quot;] &gt;= melt_df[&quot;variable&quot;]) &amp; (melt_df[&quot;value&quot;].isnull()), &quot;value&quot;] = 0

This makes checking your condition easier. Then you reverse back to get original df format:

final_df = melt_df.pivot(index=&quot;End&quot;, columns=&quot;variable&quot;, values=&quot;value&quot;).reset_index()
final_df.columns.name = None

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中填充不同数据框列切片中的NA值。

问题

答案1

答案2

答案3

如何解决django rest_framework错误”Method \”POST\” not allowed.”?

datetime struggle with a pandas dataframe

如何在PyQt5中裁剪实时视频流？QImage看起来倾斜。

How can i zip() two lists together without the output having "\n" at the beginning of every second element?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论