2023年4月4日 05:10:01go评论103阅读模式

英文:

Using python, how to fill in missing dates and data in two columns

问题

我有一个按月/日-时间排序的时间序列，后面跟着数值。由于设备故障，有些时间点的数据缺失。我想要替换这些时间点（例如下面的"21:00"，"01:00"），并且插值得到与它们相关的缺失数值。有什么好的方法可以做到这一点？

数据看起来像这样：

03/31 19:00 68.0
03/31 20:00 68.0
03/31 22:00 70.0
03/31 23:00 68.0
04/01 00:00 69.0
04/01 02:00 70.0

"04/01 00:00"的数值是字符串，而观测值是浮点数。

我使用以下代码将字符串日期转换为数字：
date_number=datetime.strptime(col_1[i],'%m/%d %H:%M') 这会得到"1900-03-31 19:00:00"作为结果。我可以对这些数值进行运算，找到缺失的时间点，将它们填充并在另一列的相同位置放入NaN值，然后对这些缺失的数值进行插值。我确信有一种更高效、标准的方法来解决这个问题，我想知道如何最好地做到这一点。

英文:

I have a time series of month/day-time followed by values. With equipment failure, some times are missing. I want to replace those times (e.g. 21:00, 01:00 below) and interpolate the missing values associated with them. What is a good way to do this?

The data looks like:

03/31 19:00 68.0
03/31 20:00 68.0
03/31 22:00 70.0
03/31 23:00 68.0
04/01 00:00 69.0
04/01 02:00 70.0

The "04/01 00:00" values are strings and the observations are floats.

I converted the string dates to numbers using:
date_number=datetime.strptime(col_1[i],'%m/%d %H:%M') which yields "1900-03-31 19:00:00" as the result. I can do arithmetic on those, find the gaps, fill them in and put nans in the same place in the other column then interpolate those missing values. I'm sure there is a more efficient, standard approach to the problem, and I'd like to know how to best do it.

答案1

得分: 1

假设有以下的数据框：

&gt;&gt;&gt; df
          日期   值
0  03/31 19:00   68.0
1  03/31 20:00   68.0
2  03/31 22:00   70.0
3  03/31 23:00   68.0
4  04/01 00:00   69.0
5  04/01 02:00   70.0

你可以创建一个以日期为索引的Series，真正用于处理时间序列的操作：

df['日期'] = pd.to_datetime('2023/' + df['日期'], format='%Y/%m/%d %H:%M')
ts = df.set_index('日期')['值'].resample('H').interpolate()

输出：

&gt;&gt;&gt; ts
日期
2023-03-31 19:00:00    68.0
2023-03-31 20:00:00    68.0
2023-03-31 21:00:00    69.0  # &lt;- 在这里
2023-03-31 22:00:00    70.0
2023-03-31 23:00:00    68.0
2023-04-01 00:00:00    69.0
2023-04-01 01:00:00    69.5  # &lt;- 在这里
2023-04-01 02:00:00    70.0
Freq: H, Name: 值, dtype: float64

英文:

Suppose the following dataframe:

&gt;&gt;&gt; df
          Date  Value
0  03/31 19:00   68.0
1  03/31 20:00   68.0
2  03/31 22:00   70.0
3  03/31 23:00   68.0
4  04/01 00:00   69.0
5  04/01 02:00   70.0

You can create a Series indexed by Date to really work on TimeSeries:

df[&#39;Date&#39;] = pd.to_datetime(&#39;2023/&#39; + df[&#39;Date&#39;], format=&#39;%Y/%m/%d %H:%M&#39;)
ts = df.set_index(&#39;Date&#39;)[&#39;Value&#39;].resample(&#39;H&#39;).interpolate()

Output:

&gt;&gt;&gt; ts
Date
2023-03-31 19:00:00    68.0
2023-03-31 20:00:00    68.0
2023-03-31 21:00:00    69.0  # &lt;- HERE
2023-03-31 22:00:00    70.0
2023-03-31 23:00:00    68.0
2023-04-01 00:00:00    69.0
2023-04-01 01:00:00    69.5  # &lt;- HERE
2023-04-01 02:00:00    70.0
Freq: H, Name: Value, dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Python，如何在两列中填充缺失的日期和数据。

问题

答案1

如何找出值从它们的周期开始时发生了怎样的变化？

在Python中是否可以使返回语句依赖于实例的类型？

修改列表中命名元组的一个元素

内存升级后增加了setuptools。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。