英文:
I have more data in a kafka topic but when i extract data using my pyspark application, I am getting only 1 row extracted, how to fix?
问题
我有更多数据在一个Kafka主题中,但是当我使用我的Pyspark应用程序提取数据时(我用它从不同的Kafka主题中提取数据),我只提取到了1行数据。之前,我曾经使用相同的Pyspark应用程序/代码从相同的主题中提取数据而没有任何问题。
有一件事我想要强调的是,我曾经尝试从相同的Databricks笔记本以及不同的Databricks笔记本中多次提取来自相同主题的数据,所以我的疑虑是,如果我可能在同一Databricks实例中同时从两个不同的笔记本中提取来自同一主题的数据,这可能会导致一些问题,从而导致我面临这个问题。如何排除故障并解决这个问题?
我是Kafka和Pyspark的新手。
英文:
I have more data in a kafka topic but when i extract data using my pyspark application (which I use to extract from different kafka topics), I am getting only 1 row extracted. Previously I had extracted data from the same topic using the same pyspark application/code without any issues.
One thing I want to highlight is that, I had tried extracting data from the topic multiple times from the same databricks notebook and also from different databricks notebook so my doubt here is if I might have extracted the data from same topic from two different notebooks at the same time in same databricks instance and it should have caused some issue due to which I am facing this issue. How to troubleshoot and fix this issue?
I am new to kafka & pyspark
答案1
得分: 1
如果您正在使用相同的 kafka.group.id
,那么已经消耗的偏移量是由该值跟踪的,您需要使用Kafka工具重置消费者组的偏移量。否则,您将仅消耗在先前已消耗和提交的偏移之后的新数据。
英文:
> Previously I had extracted data from the same topic using the same pyspark application/code without any issues.
If you're using the same kafka.group.id
, then consumed offsets are being tracked by this value, and you'll need to reset the consumer group offsets using Kafka tools. Otherwise, you'll only consume new data after the offsets that were previously consumed and committed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论