英文:
How can make consumer group in spark kafka stream and assign comsumers to consumer group
问题
我有一个名为"topic_1"的主题,创建了4个分区。我需要在Kafka Spark流中并行读取。因此我需要创建一个消费者组和消费者。
你能帮忙说明如何做吗?
目前在Kafka Spark流中,每次只能从Kafka获取一个请求。
英文:
I have one topic having name topic_1 and created 4 partitions. I need to read parallel in Kafka spark stream. so I need to make one consumer group and consumers.
Can you plz help how can I do this?
For now Kafka spark stream, one time taking one request from Kafka.
答案1
得分: 1
假设您正在使用Spark中的KafkaUtils,它会自动利用Spark执行器的数量 * 每个执行器的核心数。
因此,如果您有2个Spark执行器,每个执行器有2个内核,Spark将自动并行处理4个主题分区。
在Kafka Spark Streaming集成中,输入任务的数量取决于主题中的分区数。如果您的主题有4个分区,Spark Streaming将为每个批次生成4个任务。
如果您有1个带1个内核的执行器,那么该内核将顺序执行这4个任务(无并行处理)。而如果您有2个每个带1个内核的执行器,那么每个内核将顺序执行2个任务(因此并行处理为2)。
有4个分区时,您应该配置以下任一选项,以实现最大的消费者并行处理能力:
-
1个带有4个内核的执行器
-
2个每个带有2个内核的执行器
-
4个每个带有1个内核的执行器
英文:
Assuming you are using KafkaUtils from Spark, it automatically will take advantage of the number of Spark Executors * Cores per Executor.
So, if you have 2 Spark Executors, with 2 Cores for each Executor, Spark will automatically consume 4 topic partitions in parallel.
In Kafka Spark Streaming integration, the number of input tasks are determined by the number of partitions in the topic. If your topic has 4 partitions, Spark Streaming will spawn 4 tasks for each batch.
If you have 1 Executor with 1 Core, then the core will sequentially executes the 4 tasks (no paralellism). Whereas if you have 2 Executor with 1 Core each, then each core will sequentially executes 2 tasks (so parallelism is 2).
With 4 partitions you should configure any of the following, to achieve max consumer parallellism:
- 1 Executor with 4 Cores
- 2 Executor with 2 Cores each
- 4 Executor with 1 Core each
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论