Azure Stream Analytics 报告水印延迟。

huangapple go评论75阅读模式
英文:

Azure Stream Analytics reports watermark delay

问题

Azure Stream Analytics用于将来自Azure IoT Hub的JSON数据分发到SQL数据库的表格中。

最近出现了一些严重的水印延迟,完全破坏了流程。

当它正常工作时,有12秒的延迟,这多少还算可以接受。

总体架构如下:IoT边缘设备在有消息要发送到IoT Hub时发送JSON消息,然后每条消息作为输入传递给Stream Analytics,Stream Analytics上运行一个查询,输出是SQL数据库中的表格;SQL数据库在请求时将数据传递给Grafana,监控目的有来自Grafana仪表板的多个查询。因此,SA会写入数据,而Grafana(有时还包括Azure Data Studio)会读取数据。

度量数据的观察结果表明:

  • 来自IoT边缘设备的消息数量不影响延迟
  • 延迟发生在会话数很多时(12秒延迟时会话数为8,会话数超过8会导致延迟,例如,130个会话)
  • 有时会话数略微超过触发延迟
  • 数据IO似乎也会影响数据IO
  • 延迟发生时,数据库的DTU很高,但并不总是如此
  • 延迟时,Stream Analytics的CPU%下降。

所使用服务的以下特性:

  • DB DTUs 200
  • 300 GB
  • 3个流处理单元
  • DB MAXDOP = 0
  • 数据库CPU数量为4

Grafana查询非常简单:读取15分钟范围内的数据。数据采样时间不频繁,最多每5秒一次。

在Stream Analytics中没有数据转换或运行时错误。

还有一些使用不同数据库数据的仪表板,具有2个CPU和1秒的数据频率。它们正常工作。它们的Stream Analytics没有延迟,即使Grafana的查询时间范围可能达到12小时。

此外,在Azure门户上的SQL数据库面板的查询洞察分析中,最长运行的查询仅为5-15分钟,但花费了太多时间。

Grafana仪表板有3-6个面板,而正常工作的仪表板有超过10个面板。

我追踪了Stream Analytics输出中的以下错误(我确实仔细研究了链接,但没有帮助):

在尝试写入1个事件时遇到错误:资源ID:1。数据库的请求限制为400,已达到。请参阅 'https://docs.microsoft.com/azure/azure-sql/database/resource-limits-logical-server' 寻求帮助。

奇怪的是,相同的流程与相同数量的Grafana仪表板和来自IoT Hub的消息一起工作,通常有12秒的延迟(虽然期望更少,但至少有一些)。但有时候我会有非常糟糕的延迟(20%的时间)。

我看到其他论坛上有人遇到类似的问题,但没有一个像样的解决方案。我也不能通过ChatGPT解决。

曾经有过几次通过“KILL”所有正在运行的数据库查询来减少延迟的尝试。其中大多数来自Grafana(以前我们也为其他经常查询的用户提供了Grafana,现在其他用户已禁用 - 看起来像是Grafana的查询变得如此之长)。

如何在不增加DTU数量(支付更多费用)的情况下解决这个问题?是否应该增加CPU数量?

英文:

Azure Stream Analytics is used to distribute JSON data incoming from azure IoT hub into tables of SQL database.

Recently there were several large watermark delays which totally ruined the pipeline.

When it works normally it has a delay of 12 seconds which is more or less acceptable.

The overall architecture is as follows: the IoT edge devices send JSON messages when there are some to IoT hub, each message then follows to Stream Analytics as an input, there is a query running on Stream Analytics, the outputs are SQL db tables; the SQL db handles data to Grafana when requested, there are several queries from Grafana dashboards for monitoring purposes. Hence there are writes by SA and read from Grafana (sometimes from Azure Data studio).

Observations of metrics have shown that:

  • Number of messages sent from IoT edge devices don't affect the delay
  • The delays happen when there are so many sessions (the number of sessions when the delay is 12 sec is 8, larger than 8 sessions bring to a delay and increased number of sessions, e.g., 130 sessions)
  • Sometimes a small surplus of sessions trigger delay
  • It looks like the db Data IO also affect the Data IO
  • When there are delays the db DTU is high but not always
  • When delays the CPU% of stream analytics drops.

The following characteristics for the services used:

  • DB DTUs 200
  • 300 GB
  • 3 streaming units
  • DB MAXDOP = 0
  • DB number of CPUs 4

Grafana queries are very simple: read data for a 15 min range. The data sampling time is not frequent. Maximum every 5 seconds.

There are no data conversion or runtime errors in Stream Analytics.

There are dashboards using data from a different DB with 2 CPUs and with data frequency of 1 seconds. They work properly. Their Stream Analytics has no delay even the Grafana query time range may be 12 hours.

Also, the longest running queries analysis in insights of queries in SQL database panel on Azure portal were only for 5-15 minutes but taking too much time.

The Grafana dashboard has 3-6 panels while the working dashboard has >10 panels.

I tracked the following error among outputs of Stream Analytics (I definitely studied the link well and it didn't help):

> Encountered error trying to write 1 event(s): Resource ID : 1. The
> request limit for the database is 400 and has been reached. See
> 'https://docs.microsoft.com/azure/azure-sql/database/resource-limits-logical-server'
> for assistance.

The strange thing is the same pipeline works with the same number of Grafana dashboards and number of messages from IoT hub with regular delay of 12 seconds (which also is expected to be less, but at least something). But sometimes I have very bad days with delay (20% of the time).

I saw people have similar problem in other forums but no decent solution. I also couldn't solve using ChatGPT.

There were several attempts reducing the delay by KILL-ing all running DB queries. Most of them from Grafana (previously we also had Grafana always available for other users who queried a lot, now other users are disabled -> seems like Grafana queries became so long).

How can I solve this problem without increasing the number of DTUs (paying more)?
Should I increase the number of CPU?

答案1

得分: 0

延迟是由会话数量引起的。当会话超过8个时,延迟会增加。

减少每个流式处理节点的分区计数,以减少每个流式处理节点的输入数据。

Azure Stream Analytics 报告水印延迟。

您可以通过将流式处理节点数从3增加到6,将分配给每个流式处理节点的SU数量翻倍,从而使每个节点具有两个分区。或者,您可以将SU数量增加四倍,使每个流式处理节点处理来自一个分区的数据。

您还可以通过增加更多的分区来重新分区输入数据,以减少每个分区中的数据量。

有关详细信息,请查阅使用度量和维度分析流分析作业性能

关于您遇到的错误消息,似乎已达到数据库的请求限制。

增加DTU或CPU的数量以提高性能,但这也会增加成本。

Azure Stream Analytics 报告水印延迟。

Azure Stream Analytics 报告水印延迟。

有关更多信息,请参考重新分区 Azure 流分析作业Azure SQL 的资源管理

英文:

The delay is caused by the number of sessions. When there are more than 8 sessions, the delay increases.

Reduce the partition count for each streaming node to reduce the input data for each streaming node.

Azure Stream Analytics 报告水印延迟。

You can double the SUs allocated to each streaming node to two partitions per node by increasing streaming node count from 3 to 6. Or you can quadruple the SUs to have each streaming node handle data from one partition.

You can also repartition your input with more partitions to reduce the amount of data in each partition.

Check Analyze Stream Analytics job performance by using metrics and dimensions for details.

Regarding the error message you encountered, it seems that the request limit for the database has been reached.

Increasing the number of DTUs or CPUs to improve the performance, but it will also increase the cost.

Azure Stream Analytics 报告水印延迟。

Azure Stream Analytics 报告水印延迟。

For more information refer to repartitioning-azure-stream-analytics-jobs and Resource management in Azure SQL.

答案2

得分: 0

Azure Stream Analytics 报告由 Grafana 仪表板请求的长时间运行查询导致的水印延迟。

Grafana 仪表板用于可视化来自 Azure SQL 数据库的数据。Grafana 仪表板上的每个面板执行查询,从 SQL 数据库中读取数据。

通过创建索引来加速查询,有助于使查询运行更快。这有助于避免 Stream Analytics 中的水印延迟。

英文:

Azure Stream Analytics reports watermark delay due to long-running queries requested by Grafana dashboards.

Grafana dashboard is used to visualize data from Azure SQL database. Each panel on Grafana dashboard executes query reading data from SQL db.

The speeding up queries by creating indices helped to make queries run faster. It helped to avoid watermark delays in Stream Analytics.

huangapple
  • 本文由 发表于 2023年5月17日 23:20:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273686.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定