使用Python,如何在两列中填充缺失的日期和数据。

huangapple go评论69阅读模式
英文:

Using python, how to fill in missing dates and data in two columns

问题

我有一个按月/日-时间排序的时间序列,后面跟着数值。由于设备故障,有些时间点的数据缺失。我想要替换这些时间点(例如下面的"21:00","01:00"),并且插值得到与它们相关的缺失数值。有什么好的方法可以做到这一点?

数据看起来像这样:

03/31 19:00 68.0
03/31 20:00 68.0
03/31 22:00 70.0
03/31 23:00 68.0
04/01 00:00 69.0
04/01 02:00 70.0

"04/01 00:00"的数值是字符串,而观测值是浮点数。

我使用以下代码将字符串日期转换为数字:
date_number=datetime.strptime(col_1[i],'%m/%d %H:%M') 这会得到"1900-03-31 19:00:00"作为结果。我可以对这些数值进行运算,找到缺失的时间点,将它们填充并在另一列的相同位置放入NaN值,然后对这些缺失的数值进行插值。我确信有一种更高效、标准的方法来解决这个问题,我想知道如何最好地做到这一点。

英文:

I have a time series of month/day-time followed by values. With equipment failure, some times are missing. I want to replace those times (e.g. 21:00, 01:00 below) and interpolate the missing values associated with them. What is a good way to do this?

The data looks like:

03/31 19:00 68.0
03/31 20:00 68.0
03/31 22:00 70.0
03/31 23:00 68.0
04/01 00:00 69.0
04/01 02:00 70.0

The "04/01 00:00" values are strings and the observations are floats.

I converted the string dates to numbers using:
date_number=datetime.strptime(col_1[i],'%m/%d %H:%M') which yields "1900-03-31 19:00:00" as the result. I can do arithmetic on those, find the gaps, fill them in and put nans in the same place in the other column then interpolate those missing values. I'm sure there is a more efficient, standard approach to the problem, and I'd like to know how to best do it.

答案1

得分: 1

假设有以下的数据框:

>>> df
          日期   值
0  03/31 19:00   68.0
1  03/31 20:00   68.0
2  03/31 22:00   70.0
3  03/31 23:00   68.0
4  04/01 00:00   69.0
5  04/01 02:00   70.0

你可以创建一个以日期为索引的Series,真正用于处理时间序列的操作:

df['日期'] = pd.to_datetime('2023/' + df['日期'], format='%Y/%m/%d %H:%M')
ts = df.set_index('日期')['值'].resample('H').interpolate()

输出:

>>> ts
日期
2023-03-31 19:00:00    68.0
2023-03-31 20:00:00    68.0
2023-03-31 21:00:00    69.0  # <- 在这里
2023-03-31 22:00:00    70.0
2023-03-31 23:00:00    68.0
2023-04-01 00:00:00    69.0
2023-04-01 01:00:00    69.5  # <- 在这里
2023-04-01 02:00:00    70.0
Freq: H, Name: 值, dtype: float64
英文:

Suppose the following dataframe:

>>> df
          Date  Value
0  03/31 19:00   68.0
1  03/31 20:00   68.0
2  03/31 22:00   70.0
3  03/31 23:00   68.0
4  04/01 00:00   69.0
5  04/01 02:00   70.0

You can create a Series indexed by Date to really work on TimeSeries:

df['Date'] = pd.to_datetime('2023/' + df['Date'], format='%Y/%m/%d %H:%M')
ts = df.set_index('Date')['Value'].resample('H').interpolate()

Output:

>>> ts
Date
2023-03-31 19:00:00    68.0
2023-03-31 20:00:00    68.0
2023-03-31 21:00:00    69.0  # <- HERE
2023-03-31 22:00:00    70.0
2023-03-31 23:00:00    68.0
2023-04-01 00:00:00    69.0
2023-04-01 01:00:00    69.5  # <- HERE
2023-04-01 02:00:00    70.0
Freq: H, Name: Value, dtype: float64

huangapple
  • 本文由 发表于 2023年4月4日 05:10:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923792.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定