Flink作业在数据量较少的情况下过于繁忙。

huangapple go评论64阅读模式
英文:

Flink Job goes too much busy with less amount of data

问题

我正在配置一个能够处理每秒近100万条数据的Flink作业,我从以下配置开始:
CPU核心数:4核
内存:2GB
任务槽:4个

但每秒只有30,000条日志。但是我的作业仍然非常繁忙,存在很多背压问题。据我所知,Flink可以处理大量的数据,但这里似乎有一些矛盾之处。我可能错过了一些配置。所以有谁可以帮助我找出问题,将不胜感激。

提前谢谢!

英文:

I am configuring a Flink Job that can handle almost 1 million data per second, I have started with below configuration
CPU: 4 cores
Memory: 2GB
Task Slots: 4

with only 30k logs per second
But my job still goes too much busy and have a much backpressure,
As far as I read that Flink can handle very large amount of data But here is some contradict, I might miss out some of the configuration, So can anybody help me to figure out it would be highly appreciate

Thank you in advance

Flink作业在数据量较少的情况下过于繁忙。

I have tried by increasing a memory and parellelism but it didn't work for me, I want to understand that is it expected like with this configuration this result is okay or I should configure the job in any other way.

答案1

得分: 0

对于从Kafka读取数据,进行基于广播流的数据丰富,然后写入Hudi的工作流程,我获得了每核心大约13,000条记录/秒的速率。这是在使用更快的序列化/反序列化工具从Kafka解析记录等优化的情况下获得的。

因此,使用4个核心,每秒30,000条记录处于合适的范围。

请注意,增加并行性而不增加可用核心数量将不会有所帮助,通常会降低吞吐量。

英文:

For a workflow reading from Kafka, doing a broadcast-stream based enrichment, and writing to Hudi, I got a rate of about 13K records/sec/core. This is with optimizations like using faster-serde for deserializing records from Kafka, etc.

So with 4 cores, 30K records/second is in the right ballpark.

Note that increasing parallelism without increasing the number of cores available won't help, and typically hurts your throughput.

huangapple
  • 本文由 发表于 2023年7月3日 13:12:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76601973.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定