2023年7月3日 13:12:02go评论97阅读模式

英文:

Flink Job goes too much busy with less amount of data

问题

我正在配置一个能够处理每秒近100万条数据的Flink作业，我从以下配置开始：
CPU核心数：4核
内存：2GB
任务槽：4个

但每秒只有30,000条日志。但是我的作业仍然非常繁忙，存在很多背压问题。据我所知，Flink可以处理大量的数据，但这里似乎有一些矛盾之处。我可能错过了一些配置。所以有谁可以帮助我找出问题，将不胜感激。

提前谢谢！

英文:

I am configuring a Flink Job that can handle almost 1 million data per second, I have started with below configuration
CPU: 4 cores
Memory: 2GB
Task Slots: 4

with only 30k logs per second
But my job still goes too much busy and have a much backpressure,
As far as I read that Flink can handle very large amount of data But here is some contradict, I might miss out some of the configuration, So can anybody help me to figure out it would be highly appreciate

Thank you in advance

I have tried by increasing a memory and parellelism but it didn't work for me, I want to understand that is it expected like with this configuration this result is okay or I should configure the job in any other way.

答案1

得分: 0

对于从Kafka读取数据，进行基于广播流的数据丰富，然后写入Hudi的工作流程，我获得了每核心大约13,000条记录/秒的速率。这是在使用更快的序列化/反序列化工具从Kafka解析记录等优化的情况下获得的。

因此，使用4个核心，每秒30,000条记录处于合适的范围。

请注意，增加并行性而不增加可用核心数量将不会有所帮助，通常会降低吞吐量。

英文:

For a workflow reading from Kafka, doing a broadcast-stream based enrichment, and writing to Hudi, I got a rate of about 13K records/sec/core. This is with optimizations like using faster-serde for deserializing records from Kafka, etc.

So with 4 cores, 30K records/second is in the right ballpark.

Note that increasing parallelism without increasing the number of cores available won't help, and typically hurts your throughput.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Flink作业在数据量较少的情况下过于繁忙。

问题

答案1

如何在Flink流处理中的带有过滤器的键控流上添加处理函数？

Apache Flink SQL 流查询的数据持久性

Word Count Number is always changing, when using Flink

如何减少或禁用 Flink 中的检查点日志

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。