根据时间填充列数值

huangapple go评论65阅读模式
英文:

Fill column values based on time of day

问题

我有时间戳的数据,我想对其进行重新采样,并从指定时间开始,一直填充列行到记录的时间。

以下是数据的样子:

df
timestamp              col1  
2020-10-10 09:21:00     20     
2020-10-11 10:42:00     30

我想将其重新采样为10分钟间隔,并从06:00:00开始填充col1的值,直到记录的时间,使其看起来像这样:

df
timestamp              col1       
2020-10-10 06:00:00     20
2020-10-10 06:10:00     20
2020-10-10 06:20:00     20
...
2020-10-10 09:20:00     20
2020-10-10 09:30:00     NaN
2020-10-10 09:40:00     NaN
...
2020-10-11 06:00:00     30
2020-10-11 06:10:00     30
...
2020-10-11 10:40:00     30
英文:

I have data with timestamps, I want to resample and back fill column rows with the logged value starting from a specified time until the logged time.

Here is what the data looks like

df
timestamp              col1  
2020-10-10 09:21:00     20     
2020-10-11 10:42:00     30          
..

I want to resample it to 10-minute intervals, and fill col1 values starting from 06:00:00 with the logged value until the log time, to look like this:

df
timestamp              col1       
2020-10-10 06:00:00     20
2020-10-10 06:10:00     20
2020-10-10 06:20:00     20
...
2020-10-10 09:20:00     20
2020-10-10 09:30:00     NaN
2020-10-10 09:40:00     NaN
...
2020-10-11 06:00:00     30
2020-10-11 06:10:00     30
..
2020-10-11 10:40:00     30

答案1

得分: 2

以下是翻译好的代码部分:

你可以使用concat来作为起始点,然后使用resample,接着使用groupby.bfill按天处理:

out = (
 pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min().normalize()+pd.Timedelta('06:00:00')]}), df])
   .resample('10min', on='timestamp').mean().reset_index()
)
out.groupby(out['timestamp'].dt.normalize()).bfill()

输出结果:

              timestamp  col1
0   2020-10-10 06:00:00  20.0
1   2020-10-10 06:10:00  20.0
2   2020-10-10 06:20:00  20.0
3   2020-10-10 06:30:00  20.0
4   2020-10-10 06:40:00  20.0
..                  ...   ...
18  2020-10-10 09:00:00  20.0
19  2020-10-10 09:10:00  20.0
20  2020-10-10 09:20:00  20.0
21  2020-10-10 09:30:00   NaN
22  2020-10-10 09:40:00   NaN
23  2020-10-10 09:50:00   NaN
..                  ...   ...
168 2020-10-11 10:00:00  30.0
169 2020-10-11 10:10:00  30.0
170 2020-10-11 10:20:00  30.0
171 2020-10-11 10:30:00  30.0
172 2020-10-11 10:40:00  30.0

希望这有所帮助。如果有其他疑问,请随时提出。

英文:

You can concat your starting point, and resample, then groupby.bfill per day:

out = (
 pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min().normalize()+pd.Timedelta('06:00:00')]}), df])
   .resample('10min', on='timestamp').mean().reset_index()
)
out.groupby(out['timestamp'].dt.normalize()).bfill()

Output:

              timestamp  col1
0   2020-10-10 06:00:00  20.0
1   2020-10-10 06:10:00  20.0
2   2020-10-10 06:20:00  20.0
3   2020-10-10 06:30:00  20.0
4   2020-10-10 06:40:00  20.0
..                  ...   ...
18  2020-10-10 09:00:00  20.0
19  2020-10-10 09:10:00  20.0
20  2020-10-10 09:20:00  20.0
21  2020-10-10 09:30:00   NaN
22  2020-10-10 09:40:00   NaN
23  2020-10-10 09:50:00   NaN
..                  ...   ...
168 2020-10-11 10:00:00  30.0
169 2020-10-11 10:10:00  30.0
170 2020-10-11 10:20:00  30.0
171 2020-10-11 10:30:00  30.0
172 2020-10-11 10:40:00  30.0

huangapple
  • 本文由 发表于 2023年3月20日 22:34:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75791623.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定