英文:
How to impute missing rows with mean in pandas?
问题
我有一个以1小时为间隔的日期时间作为索引的数据框。然而,有时会缺少一行。就像下面的示例,其中没有2019-01-01 04:00:00
的行:
price_eur
datetime
2019-01-01 00:00:00 51.0
2019-01-01 01:00:00 46.27
2019-01-01 02:00:00 39.78
2019-01-01 03:00:00 20.0
2019-01-01 05:00:00 22.0
我想通过取前后缺失行的元素的平均值来填补缺失行,即我想获得以下数据框:
price_eur
datetime
2019-01-01 00:00:00 51.0
2019-01-01 01:00:00 46.27
2019-01-01 02:00:00 39.78
2019-01-01 03:00:00 20.0
2019-01-01 04:00:00 21.0
2019-01-01 05:00:00 22.0
我知道我可以使用resample
方法来用前面或后面的值填补缺失值,像这样:
prices_df.resample('1H').fillna('pad',limit=1)
但我不确定如何用平均值来填补。能有人帮忙吗?
英文:
I have a dataframe with datetimes separated by 1 hour as indexes. However, sometimes a row is missing. Something like the example below, where there is no row for the datetime 2019-01-01 04:00:00
:
price_eur
datetime
2019-01-01 00:00:00 51.0
2019-01-01 01:00:00 46.27
2019-01-01 02:00:00 39.78
2019-01-01 03:00:00 20.0
2019-01-01 05:00:00 22.0
I want to impute the missing rows by taking the average of the elements of the rows immediately surrounding the missing one, i.e. I want to obtain the following dataframe:
price_eur
datetime
2019-01-01 00:00:00 51.0
2019-01-01 01:00:00 46.27
2019-01-01 02:00:00 39.78
2019-01-01 03:00:00 20.0
2019-01-01 04:00:00 21.0
2019-01-01 05:00:00 22.0
I know that I could use the resample method to impute the missing value either with the value preceding the missing one or the value following it, like so,
prices_df.resample('1H').fillna('pad',limit=1)
but I'm not sure how to impute with the mean. Can anybody help?
答案1
得分: 2
df.asfreq('1h').interpolate()
price_eur
datetime
2023-05-31 00:00:00 51.00
2023-05-31 01:00:00 46.27
2023-05-31 02:00:00 39.78
2023-05-31 03:00:00 20.00
2023-05-31 04:00:00 21.00
2023-05-31 05:00:00 22.00
英文:
df.asfreq('1h').interpolate()
price_eur
datetime
2023-05-31 00:00:00 51.00
2023-05-31 01:00:00 46.27
2023-05-31 02:00:00 39.78
2023-05-31 03:00:00 20.00
2023-05-31 04:00:00 21.00
2023-05-31 05:00:00 22.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论