I have more data in a kafka topic but when i extract data using my pyspark application, I am getting only 1 row extracted, how to fix?

huangapple go评论127阅读模式

I have more data in a kafka topic but when i extract data using my pyspark application, I am getting only 1 row extracted, how to fix?






I have more data in a kafka topic but when i extract data using my pyspark application (which I use to extract from different kafka topics), I am getting only 1 row extracted. Previously I had extracted data from the same topic using the same pyspark application/code without any issues.

One thing I want to highlight is that, I had tried extracting data from the topic multiple times from the same databricks notebook and also from different databricks notebook so my doubt here is if I might have extracted the data from same topic from two different notebooks at the same time in same databricks instance and it should have caused some issue due to which I am facing this issue. How to troubleshoot and fix this issue?

I am new to kafka & pyspark


得分: 1

如果您正在使用相同的 kafka.group.id,那么已经消耗的偏移量是由该值跟踪的,您需要使用Kafka工具重置消费者组的偏移量。否则,您将仅消耗在先前已消耗和提交的偏移之后的新数据。


> Previously I had extracted data from the same topic using the same pyspark application/code without any issues.

If you're using the same kafka.group.id, then consumed offsets are being tracked by this value, and you'll need to reset the consumer group offsets using Kafka tools. Otherwise, you'll only consume new data after the offsets that were previously consumed and committed.

  • 本文由 发表于 2023年3月1日 15:37:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75600735.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
