英文:
Where to begin with fixing an offsets issue with Debezium Engine
问题
我正在使用Debezium引擎从MySQL数据库同步数据。由于我正在使用Debezium引擎,我正在使用org.apache.kafka.connect.storage.FileOffsetBackingStore
来记录我的当前变更偏移。我认为我的计算机最近断电,导致我的偏移文件损坏。当我现在尝试运行我的Debezium引擎应用程序时,我从Debezium得到以下错误。
ERROR io.debezium.embedded.EmbeddedEngine - 无法配置和启动'org.apache.kafka.connect.storage.FileOffsetBackingStore'偏移后备存储
org.apache.kafka.connect.errors.ConnectException: java.io.StreamCorruptedException: invalid stream header: 00000000
at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:86)
at org.apache.kafka.connect.storage.FileOffsetBackingStore.start(FileOffsetBackingStore.java:59)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:691)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:192)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.StreamCorruptedException: invalid stream header: 00000000
at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:987)
at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:414)
at org.apache.kafka.connect.util.SafeObjectInputStream.<init>(SafeObjectInputStream.java:48)
at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:71)
... 8 common frames omitted
我想以正确的方式解决这个问题,通过修复Debezium偏移文件,但我不知道从哪里开始。我首先需要弄清楚我想要的偏移量,这将是失败的偏移量(或接近它的偏移量)。我可以从损坏的文件中获取偏移量,但如果无法获取,我是否可以使用时间戳来找到一个好的起始偏移量?看起来我可以使用这个工具来将文件更新到我选择的偏移量位置(https://github.com/nathan-smit-1/HashmapEditor),一旦我知道了它。然后,如何按时间顺序获取偏移量列表,以便知道我应该将其更改为哪个偏移量?
英文:
I'm using Debezium engine to sync data from a MySQL database. Since I'm using Debezium Engine I'm using the org.apache.kafka.connect.storage.FileOffsetBackingStore
to record my current changes offset. I think my computer had a power outage recently which resulted in the corruption of my offset file. When I try to run my Debezium engine app now, I get this error from Debezium.
ERROR io.debezium.embedded.EmbeddedEngine - Unable to configure and start the 'org.apache.kafka.connect.storage.FileOffsetBackingStore' offset backing store
org.apache.kafka.connect.errors.ConnectException: java.io.StreamCorruptedException: invalid stream header: 00000000
at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:86)
at org.apache.kafka.connect.storage.FileOffsetBackingStore.start(FileOffsetBackingStore.java:59)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:691)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:192)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.StreamCorruptedException: invalid stream header: 00000000
at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:987)
at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:414)
at org.apache.kafka.connect.util.SafeObjectInputStream.<init>(SafeObjectInputStream.java:48)
at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:71)
... 8 common frames omitted
I'd like to fix this issue the proper way, by repairing the Debezium offset file but I don't know where to begin. I'm thinking I first need to figure out what offset I want, which would be the offset at which it was failing (or an offset before but near it). I may be able to get the offset from the corrupted file, but if not, can I use a timestamp to find a good offset to start at? It looks like I can use this tool to update the file to point to an offset of my choice (https://github.com/nathan-smit-1/HashmapEditor) once I know it. How, though, do I get a list of offsets in chronological order so I know which one I should change it to?
答案1
得分: 0
- 删除损坏的 offsets.dat 文件。
- 启动 Debezium 以生成一个新的、可用的 offsets.dat 文件。
- 使用过去的 Debezium 日志找到 Debezium 最近处理过的偏移量(越新越好)。
- 使用能够二进制序列化步骤 3 中找到的偏移信息的工具编辑 offsets.dat 文件。具体取决于你要更改什么,一个十六进制编辑器可能会起作用,我不确定,我使用了一个 Java 应用程序将数据写入文件。
英文:
After a lot of trial and error and lots of online research I found the best solution to this was the following steps:
- Delete the corrupt offsets.dat file
- Start Debezium to generate a new, working offsets.dat file for this machine
- Use past Debezium logs to find an offset that Debezium processed recently (the more recent the better)
- Edit the offsets.dat file using a tool that can binary serialize the offset information you found in step 3. A hex editor might work depending on what you're changing, I don't know, I used a Java app to write the data to the file
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论