如何在pandas中使用均值填充缺失的行?

huangapple go评论97阅读模式
英文:

How to impute missing rows with mean in pandas?

问题

我有一个以1小时为间隔的日期时间作为索引的数据框。然而,有时会缺少一行。就像下面的示例,其中没有2019-01-01 04:00:00的行:

  1. price_eur
  2. datetime
  3. 2019-01-01 00:00:00 51.0
  4. 2019-01-01 01:00:00 46.27
  5. 2019-01-01 02:00:00 39.78
  6. 2019-01-01 03:00:00 20.0
  7. 2019-01-01 05:00:00 22.0

我想通过取前后缺失行的元素的平均值来填补缺失行,即我想获得以下数据框:

  1. price_eur
  2. datetime
  3. 2019-01-01 00:00:00 51.0
  4. 2019-01-01 01:00:00 46.27
  5. 2019-01-01 02:00:00 39.78
  6. 2019-01-01 03:00:00 20.0
  7. 2019-01-01 04:00:00 21.0
  8. 2019-01-01 05:00:00 22.0

我知道我可以使用resample方法来用前面或后面的值填补缺失值,像这样:

  1. prices_df.resample('1H').fillna('pad',limit=1)

但我不确定如何用平均值来填补。能有人帮忙吗?

英文:

I have a dataframe with datetimes separated by 1 hour as indexes. However, sometimes a row is missing. Something like the example below, where there is no row for the datetime 2019-01-01 04:00:00:

  1. price_eur
  2. datetime
  3. 2019-01-01 00:00:00 51.0
  4. 2019-01-01 01:00:00 46.27
  5. 2019-01-01 02:00:00 39.78
  6. 2019-01-01 03:00:00 20.0
  7. 2019-01-01 05:00:00 22.0

I want to impute the missing rows by taking the average of the elements of the rows immediately surrounding the missing one, i.e. I want to obtain the following dataframe:

  1. price_eur
  2. datetime
  3. 2019-01-01 00:00:00 51.0
  4. 2019-01-01 01:00:00 46.27
  5. 2019-01-01 02:00:00 39.78
  6. 2019-01-01 03:00:00 20.0
  7. 2019-01-01 04:00:00 21.0
  8. 2019-01-01 05:00:00 22.0

I know that I could use the resample method to impute the missing value either with the value preceding the missing one or the value following it, like so,

  1. prices_df.resample('1H').fillna('pad',limit=1)

but I'm not sure how to impute with the mean. Can anybody help?

答案1

得分: 2

.interpolate()

  1. df.asfreq('1h').interpolate()
  1. price_eur
  2. datetime
  3. 2023-05-31 00:00:00 51.00
  4. 2023-05-31 01:00:00 46.27
  5. 2023-05-31 02:00:00 39.78
  6. 2023-05-31 03:00:00 20.00
  7. 2023-05-31 04:00:00 21.00
  8. 2023-05-31 05:00:00 22.00
英文:

.interpolate()

  1. df.asfreq('1h').interpolate()
  1. price_eur
  2. datetime
  3. 2023-05-31 00:00:00 51.00
  4. 2023-05-31 01:00:00 46.27
  5. 2023-05-31 02:00:00 39.78
  6. 2023-05-31 03:00:00 20.00
  7. 2023-05-31 04:00:00 21.00
  8. 2023-05-31 05:00:00 22.00

huangapple
  • 本文由 发表于 2023年6月1日 00:30:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375607.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定