根据时间填充列数值

huangapple go评论84阅读模式
英文:

Fill column values based on time of day

问题

我有时间戳的数据,我想对其进行重新采样,并从指定时间开始,一直填充列行到记录的时间。

以下是数据的样子:

  1. df
  2. timestamp col1
  3. 2020-10-10 09:21:00 20
  4. 2020-10-11 10:42:00 30

我想将其重新采样为10分钟间隔,并从06:00:00开始填充col1的值,直到记录的时间,使其看起来像这样:

  1. df
  2. timestamp col1
  3. 2020-10-10 06:00:00 20
  4. 2020-10-10 06:10:00 20
  5. 2020-10-10 06:20:00 20
  6. ...
  7. 2020-10-10 09:20:00 20
  8. 2020-10-10 09:30:00 NaN
  9. 2020-10-10 09:40:00 NaN
  10. ...
  11. 2020-10-11 06:00:00 30
  12. 2020-10-11 06:10:00 30
  13. ...
  14. 2020-10-11 10:40:00 30
英文:

I have data with timestamps, I want to resample and back fill column rows with the logged value starting from a specified time until the logged time.

Here is what the data looks like

  1. df
  2. timestamp col1
  3. 2020-10-10 09:21:00 20
  4. 2020-10-11 10:42:00 30
  5. ..

I want to resample it to 10-minute intervals, and fill col1 values starting from 06:00:00 with the logged value until the log time, to look like this:

  1. df
  2. timestamp col1
  3. 2020-10-10 06:00:00 20
  4. 2020-10-10 06:10:00 20
  5. 2020-10-10 06:20:00 20
  6. ...
  7. 2020-10-10 09:20:00 20
  8. 2020-10-10 09:30:00 NaN
  9. 2020-10-10 09:40:00 NaN
  10. ...
  11. 2020-10-11 06:00:00 30
  12. 2020-10-11 06:10:00 30
  13. ..
  14. 2020-10-11 10:40:00 30

答案1

得分: 2

以下是翻译好的代码部分:

你可以使用concat来作为起始点,然后使用resample,接着使用groupby.bfill按天处理:

  1. out = (
  2. pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min().normalize()+pd.Timedelta('06:00:00')]}), df])
  3. .resample('10min', on='timestamp').mean().reset_index()
  4. )
  5. out.groupby(out['timestamp'].dt.normalize()).bfill()

输出结果:

  1. timestamp col1
  2. 0 2020-10-10 06:00:00 20.0
  3. 1 2020-10-10 06:10:00 20.0
  4. 2 2020-10-10 06:20:00 20.0
  5. 3 2020-10-10 06:30:00 20.0
  6. 4 2020-10-10 06:40:00 20.0
  7. .. ... ...
  8. 18 2020-10-10 09:00:00 20.0
  9. 19 2020-10-10 09:10:00 20.0
  10. 20 2020-10-10 09:20:00 20.0
  11. 21 2020-10-10 09:30:00 NaN
  12. 22 2020-10-10 09:40:00 NaN
  13. 23 2020-10-10 09:50:00 NaN
  14. .. ... ...
  15. 168 2020-10-11 10:00:00 30.0
  16. 169 2020-10-11 10:10:00 30.0
  17. 170 2020-10-11 10:20:00 30.0
  18. 171 2020-10-11 10:30:00 30.0
  19. 172 2020-10-11 10:40:00 30.0

希望这有所帮助。如果有其他疑问,请随时提出。

英文:

You can concat your starting point, and resample, then groupby.bfill per day:

  1. out = (
  2. pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min().normalize()+pd.Timedelta('06:00:00')]}), df])
  3. .resample('10min', on='timestamp').mean().reset_index()
  4. )
  5. out.groupby(out['timestamp'].dt.normalize()).bfill()

Output:

  1. timestamp col1
  2. 0 2020-10-10 06:00:00 20.0
  3. 1 2020-10-10 06:10:00 20.0
  4. 2 2020-10-10 06:20:00 20.0
  5. 3 2020-10-10 06:30:00 20.0
  6. 4 2020-10-10 06:40:00 20.0
  7. .. ... ...
  8. 18 2020-10-10 09:00:00 20.0
  9. 19 2020-10-10 09:10:00 20.0
  10. 20 2020-10-10 09:20:00 20.0
  11. 21 2020-10-10 09:30:00 NaN
  12. 22 2020-10-10 09:40:00 NaN
  13. 23 2020-10-10 09:50:00 NaN
  14. .. ... ...
  15. 168 2020-10-11 10:00:00 30.0
  16. 169 2020-10-11 10:10:00 30.0
  17. 170 2020-10-11 10:20:00 30.0
  18. 171 2020-10-11 10:30:00 30.0
  19. 172 2020-10-11 10:40:00 30.0

huangapple
  • 本文由 发表于 2023年3月20日 22:34:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75791623.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定