英文:
InfluxDB: How to deal with missing data?
问题
问题描述
我们执行了许多时间序列查询,这些查询有时会出现问题,通常是通过API(Python)执行的,有时会因数据缺失而导致完全失败。
由于这种情况,我们不确定在哪里可以获得关于如何处理我们时间序列(influxdb)数据库中的缺失数据的特定问题的答案。
例子
以一个示例来描述问题...
我们有一些时间序列数据,比如我们测量房间的温度,现在我们有许多房间,有时传感器会死机或停止工作一两周,然后我们会更换它们,依此类推,在这段时间内数据是缺失的。
现在我们尝试执行某些计算,它们失败了,比如我们想要计算每天的平均温度,现在这将失败,因为有些天我们的传感器没有测量输入。
我们考虑的一个方法是,我们只是对那一天的数据进行插值。使用最后和第一个可用的值,并将该值放在没有数据的那些天。
这有许多缺点,主要的一个是由于虚假数据,你不能信任它,对于我们那些更严肃的流程,我们更愿意不存储虚假数据(或插值数据)。
我们想知道对于这个问题有哪些可能的替代方案,以及我们在哪里可以找到资源来教育自己有关这个主题。
英文:
Question Description
We are performing a lot of timeseries queries, these queries sometimes result in issues, they are usually performed through an API (Python) and sometimes result in complete failure due to data missing.
Due to this situation we are not sure where to educate ourselves and get the answer to this specific question on, how to deal with missing data in our timeseries (influxdb) database
Example
To describe a problem in an example..
We have some timeseries data, let's say we measure the temperature of the room, now we have many rooms and sometimes sensors die or stop working for a week or two, then we replace them and so on, in that timeframe the data is missing.
Now we try to perform certain calculations, they fail, let's say we want to calculate the temperature average per each day, now this will fail because some days we have no measurement input on the sensors.
One approach that we thought of is that we just interpolate the data for that day. Use the last and the first available and just place that value for the days that there is no data available.
This has many downsides, major one being due to fake data, you can't trust it and for our processes that are a bit more serious we would prefer to not store fake data (or interpolated).
We were wondering what the possible alternatives were to this question and where can we find the resource to educate ourselves on such topic.
答案1
得分: 0
以下是翻译好的部分:
"Answer"
这个想法是我们用null
或None
这样的数据来填补缺失的数值,空白的部分。这样,我们可以使用Influxdb内置的填充功能。
https://docs.influxdata.com/influxdb/cloud/query-data/flux/fill/
就像在这个例子中,我们能够填充空值,从而对数据进行进一步的查询和分析操作。
上面的链接引用包含了我们可以使用的所有方法,来解决和填充缺失的数据数值。
英文:
Answer
The idea is that we fill the missing values, the gaps, with data that is null
or None
. This way we can use influxdb built-in fill.
https://docs.influxdata.com/influxdb/cloud/query-data/flux/fill/
Like in this example, we are able to fill null values and thereby perform any additional queries and actions on the data on analysis.
The link reference above contains all of the methodologies that we can use to resolve and fill in the missing data values.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论