英文:
Fill column values based on time of day
问题
我有时间戳的数据,我想对其进行重新采样,并从指定时间开始,一直填充列行到记录的时间。
以下是数据的样子:
df
timestamp col1
2020-10-10 09:21:00 20
2020-10-11 10:42:00 30
我想将其重新采样为10分钟间隔,并从06:00:00开始填充col1的值,直到记录的时间,使其看起来像这样:
df
timestamp col1
2020-10-10 06:00:00 20
2020-10-10 06:10:00 20
2020-10-10 06:20:00 20
...
2020-10-10 09:20:00 20
2020-10-10 09:30:00 NaN
2020-10-10 09:40:00 NaN
...
2020-10-11 06:00:00 30
2020-10-11 06:10:00 30
...
2020-10-11 10:40:00 30
英文:
I have data with timestamps, I want to resample and back fill column rows with the logged value starting from a specified time until the logged time.
Here is what the data looks like
df
timestamp col1
2020-10-10 09:21:00 20
2020-10-11 10:42:00 30
..
I want to resample it to 10-minute intervals, and fill col1 values starting from 06:00:00 with the logged value until the log time, to look like this:
df
timestamp col1
2020-10-10 06:00:00 20
2020-10-10 06:10:00 20
2020-10-10 06:20:00 20
...
2020-10-10 09:20:00 20
2020-10-10 09:30:00 NaN
2020-10-10 09:40:00 NaN
...
2020-10-11 06:00:00 30
2020-10-11 06:10:00 30
..
2020-10-11 10:40:00 30
答案1
得分: 2
以下是翻译好的代码部分:
你可以使用concat
来作为起始点,然后使用resample
,接着使用groupby.bfill
按天处理:
out = (
pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min().normalize()+pd.Timedelta('06:00:00')]}), df])
.resample('10min', on='timestamp').mean().reset_index()
)
out.groupby(out['timestamp'].dt.normalize()).bfill()
输出结果:
timestamp col1
0 2020-10-10 06:00:00 20.0
1 2020-10-10 06:10:00 20.0
2 2020-10-10 06:20:00 20.0
3 2020-10-10 06:30:00 20.0
4 2020-10-10 06:40:00 20.0
.. ... ...
18 2020-10-10 09:00:00 20.0
19 2020-10-10 09:10:00 20.0
20 2020-10-10 09:20:00 20.0
21 2020-10-10 09:30:00 NaN
22 2020-10-10 09:40:00 NaN
23 2020-10-10 09:50:00 NaN
.. ... ...
168 2020-10-11 10:00:00 30.0
169 2020-10-11 10:10:00 30.0
170 2020-10-11 10:20:00 30.0
171 2020-10-11 10:30:00 30.0
172 2020-10-11 10:40:00 30.0
希望这有所帮助。如果有其他疑问,请随时提出。
英文:
You can concat
your starting point, and resample
, then groupby.bfill
per day:
out = (
pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min().normalize()+pd.Timedelta('06:00:00')]}), df])
.resample('10min', on='timestamp').mean().reset_index()
)
out.groupby(out['timestamp'].dt.normalize()).bfill()
Output:
timestamp col1
0 2020-10-10 06:00:00 20.0
1 2020-10-10 06:10:00 20.0
2 2020-10-10 06:20:00 20.0
3 2020-10-10 06:30:00 20.0
4 2020-10-10 06:40:00 20.0
.. ... ...
18 2020-10-10 09:00:00 20.0
19 2020-10-10 09:10:00 20.0
20 2020-10-10 09:20:00 20.0
21 2020-10-10 09:30:00 NaN
22 2020-10-10 09:40:00 NaN
23 2020-10-10 09:50:00 NaN
.. ... ...
168 2020-10-11 10:00:00 30.0
169 2020-10-11 10:10:00 30.0
170 2020-10-11 10:20:00 30.0
171 2020-10-11 10:30:00 30.0
172 2020-10-11 10:40:00 30.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论