2023年3月1日 15:37:45go评论127阅读模式

英文:

I have more data in a kafka topic but when i extract data using my pyspark application, I am getting only 1 row extracted, how to fix?

问题

我有更多数据在一个Kafka主题中，但是当我使用我的Pyspark应用程序提取数据时（我用它从不同的Kafka主题中提取数据），我只提取到了1行数据。之前，我曾经使用相同的Pyspark应用程序/代码从相同的主题中提取数据而没有任何问题。

有一件事我想要强调的是，我曾经尝试从相同的Databricks笔记本以及不同的Databricks笔记本中多次提取来自相同主题的数据，所以我的疑虑是，如果我可能在同一Databricks实例中同时从两个不同的笔记本中提取来自同一主题的数据，这可能会导致一些问题，从而导致我面临这个问题。如何排除故障并解决这个问题？

我是Kafka和Pyspark的新手。

英文:

I have more data in a kafka topic but when i extract data using my pyspark application (which I use to extract from different kafka topics), I am getting only 1 row extracted. Previously I had extracted data from the same topic using the same pyspark application/code without any issues.

One thing I want to highlight is that, I had tried extracting data from the topic multiple times from the same databricks notebook and also from different databricks notebook so my doubt here is if I might have extracted the data from same topic from two different notebooks at the same time in same databricks instance and it should have caused some issue due to which I am facing this issue. How to troubleshoot and fix this issue?

I am new to kafka & pyspark

答案1

得分: 1

如果您正在使用相同的 kafka.group.id，那么已经消耗的偏移量是由该值跟踪的，您需要使用Kafka工具重置消费者组的偏移量。否则，您将仅消耗在先前已消耗和提交的偏移之后的新数据。

英文:

> Previously I had extracted data from the same topic using the same pyspark application/code without any issues.

If you're using the same kafka.group.id, then consumed offsets are being tracked by this value, and you'll need to reset the consumer group offsets using Kafka tools. Otherwise, you'll only consume new data after the offsets that were previously consumed and committed.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

I have more data in a kafka topic but when i extract data using my pyspark application, I am getting only 1 row extracted, how to fix?

问题

答案1

SpringBoot @KafkaListener 收到 MessageConversionException: 无法将 A 转换为 B

Kafka每秒最大吞吐量（以消息为单位）

在Go中消费Kafka Avro消息

Kafka Broker on Gitpod

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论