2023年6月1日 00:30:35go评论97阅读模式

英文:

How to impute missing rows with mean in pandas?

问题

我有一个以1小时为间隔的日期时间作为索引的数据框。然而，有时会缺少一行。就像下面的示例，其中没有2019-01-01 04:00:00的行：

                                 price_eur
    datetime                         
    2019-01-01 00:00:00          51.0
    2019-01-01 01:00:00          46.27
    2019-01-01 02:00:00          39.78
    2019-01-01 03:00:00          20.0
    2019-01-01 05:00:00          22.0

我想通过取前后缺失行的元素的平均值来填补缺失行，即我想获得以下数据框：

                                 price_eur
    datetime                         
    2019-01-01 00:00:00          51.0
    2019-01-01 01:00:00          46.27
    2019-01-01 02:00:00          39.78
    2019-01-01 03:00:00          20.0
    2019-01-01 04:00:00          21.0
    2019-01-01 05:00:00          22.0

我知道我可以使用resample方法来用前面或后面的值填补缺失值，像这样：

    prices_df.resample('1H').fillna('pad',limit=1)

但我不确定如何用平均值来填补。能有人帮忙吗？

英文:

I have a dataframe with datetimes separated by 1 hour as indexes. However, sometimes a row is missing. Something like the example below, where there is no row for the datetime 2019-01-01 04:00:00:

                             price_eur
datetime                         
2019-01-01 00:00:00          51.0
2019-01-01 01:00:00          46.27
2019-01-01 02:00:00          39.78
2019-01-01 03:00:00          20.0
2019-01-01 05:00:00          22.0

I want to impute the missing rows by taking the average of the elements of the rows immediately surrounding the missing one, i.e. I want to obtain the following dataframe:

                             price_eur
datetime                         
2019-01-01 00:00:00          51.0
2019-01-01 01:00:00          46.27
2019-01-01 02:00:00          39.78
2019-01-01 03:00:00          20.0
2019-01-01 04:00:00          21.0
2019-01-01 05:00:00          22.0

I know that I could use the resample method to impute the missing value either with the value preceding the missing one or the value following it, like so,

prices_df.resample(&#39;1H&#39;).fillna(&#39;pad&#39;,limit=1)

but I'm not sure how to impute with the mean. Can anybody help?

答案1

得分: 2

.interpolate()

df.asfreq('1h').interpolate()

                     price_eur
datetime                      
2023-05-31 00:00:00      51.00
2023-05-31 01:00:00      46.27
2023-05-31 02:00:00      39.78
2023-05-31 03:00:00      20.00
2023-05-31 04:00:00      21.00
2023-05-31 05:00:00      22.00

英文:

.interpolate()

df.asfreq(&#39;1h&#39;).interpolate()

                     price_eur
datetime                      
2023-05-31 00:00:00      51.00
2023-05-31 01:00:00      46.27
2023-05-31 02:00:00      39.78
2023-05-31 03:00:00      20.00
2023-05-31 04:00:00      21.00
2023-05-31 05:00:00      22.00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas中使用均值填充缺失的行？

问题

答案1

Moving from Django signals to save override: How to translate the "created" parameter of a Django post_save signal for a save method override

在 Pandas 中选择满足多个条件的行：

禁用 PostgreSQL 索引更新暂时，并稍后手动更新索引以提高插入语句性能。

基于最后日期合并两个数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。