如何在pandas中使用均值填充缺失的行?

huangapple go评论73阅读模式
英文:

How to impute missing rows with mean in pandas?

问题

我有一个以1小时为间隔的日期时间作为索引的数据框。然而,有时会缺少一行。就像下面的示例,其中没有2019-01-01 04:00:00的行:

                                 price_eur
    datetime                         
    2019-01-01 00:00:00          51.0
    2019-01-01 01:00:00          46.27
    2019-01-01 02:00:00          39.78
    2019-01-01 03:00:00          20.0
    2019-01-01 05:00:00          22.0

我想通过取前后缺失行的元素的平均值来填补缺失行,即我想获得以下数据框:

                                 price_eur
    datetime                         
    2019-01-01 00:00:00          51.0
    2019-01-01 01:00:00          46.27
    2019-01-01 02:00:00          39.78
    2019-01-01 03:00:00          20.0
    2019-01-01 04:00:00          21.0
    2019-01-01 05:00:00          22.0

我知道我可以使用resample方法来用前面或后面的值填补缺失值,像这样:

    prices_df.resample('1H').fillna('pad',limit=1)

但我不确定如何用平均值来填补。能有人帮忙吗?

英文:

I have a dataframe with datetimes separated by 1 hour as indexes. However, sometimes a row is missing. Something like the example below, where there is no row for the datetime 2019-01-01 04:00:00:

                             price_eur
datetime                         
2019-01-01 00:00:00          51.0
2019-01-01 01:00:00          46.27
2019-01-01 02:00:00          39.78
2019-01-01 03:00:00          20.0
2019-01-01 05:00:00          22.0

I want to impute the missing rows by taking the average of the elements of the rows immediately surrounding the missing one, i.e. I want to obtain the following dataframe:

                             price_eur
datetime                         
2019-01-01 00:00:00          51.0
2019-01-01 01:00:00          46.27
2019-01-01 02:00:00          39.78
2019-01-01 03:00:00          20.0
2019-01-01 04:00:00          21.0
2019-01-01 05:00:00          22.0

I know that I could use the resample method to impute the missing value either with the value preceding the missing one or the value following it, like so,

prices_df.resample('1H').fillna('pad',limit=1)

but I'm not sure how to impute with the mean. Can anybody help?

答案1

得分: 2

.interpolate()

df.asfreq('1h').interpolate()
                     price_eur
datetime                      
2023-05-31 00:00:00      51.00
2023-05-31 01:00:00      46.27
2023-05-31 02:00:00      39.78
2023-05-31 03:00:00      20.00
2023-05-31 04:00:00      21.00
2023-05-31 05:00:00      22.00
英文:

.interpolate()

df.asfreq('1h').interpolate()
                     price_eur
datetime                      
2023-05-31 00:00:00      51.00
2023-05-31 01:00:00      46.27
2023-05-31 02:00:00      39.78
2023-05-31 03:00:00      20.00
2023-05-31 04:00:00      21.00
2023-05-31 05:00:00      22.00

huangapple
  • 本文由 发表于 2023年6月1日 00:30:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375607.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定