英文:
Scala module requiring specific version of data bind for Spark
问题
我在尝试加载、读取和查询 Parquet 文件时遇到了问题。基础设施似乎已经设置好(Spark 独立版 3.0),可以看到并且会执行作业。
问题出现在调用以下代码时:
Dataset<Row> parquetFileDF = sparkSession.read().parquet(parquePath);
会抛出以下错误:
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
我查看了 JacksonModule.setupModule
,在到达 context.getMapperVersion
时传递的版本为 2.9.10。我觉得 DefaultScalaModule 可能引用了一些旧版本。
我正在使用 Gradle 进行构建,并且已经设置了以下依赖:
implementation 'com.fasterxml.jackson.core:jackson-core:2.10.0'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.10.0'
implementation 'org.apache.spark:spark-core_2.12:3.0.0'
implementation 'org.apache.spark:spark-sql_2.12:3.0.0'
implementation 'org.apache.spark:spark-launcher_2.12:3.0.0'
implementation 'org.apache.spark:spark-catalyst_2.12:3.0.0'
implementation 'org.apache.spark:spark-streaming_2.12:3.0.0'
这似乎没有起作用,因此我尝试强制使用 databind:
implementation ('com.fasterxml.jackson.core:jackson-databind') {
version {
strictly '2.10.0'
}
}
我已经尝试了几个不同的版本,仍然遇到这个问题。也许我忽略了一些非常简单的东西,但目前看来,我似乎无法摆脱这个错误。非常感谢任何帮助。
英文:
I am having issues trying to get Spark to load, read and query a parquet file. The infrastructure seems to be set up (Spark standalone 3.0) and can be seen and will pick up jobs.
The issue I am having is when this line is called
Dataset<Row> parquetFileDF = sparkSession.read().parquet(parquePath);
the following error is thrown
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
I looked into JacksonModule.setupModule
and when it gets to context.getMapperVersion
the version that is being passed is 2.9.10. It appears to me that the DefaultScalaModule is pulling some older version.
I'm using Gradle to build and have the dependencies set up as such
implementation 'com.fasterxml.jackson.core:jackson-core:2.10.0'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.10.0'
implementation 'org.apache.spark:spark-core_2.12:3.0.0'
implementation 'org.apache.spark:spark-sql_2.12:3.0.0'
implementation 'org.apache.spark:spark-launcher_2.12:3.0.0'
implementation 'org.apache.spark:spark-catalyst_2.12:3.0.0'
implementation 'org.apache.spark:spark-streaming_2.12:3.0.0'
That didn't work, so I tried forcing databind
implementation ('com.fasterxml.jackson.core:jackson-databind') {
version {
strictly '2.10.0'
}
}
I've tried a few different versions and still keep hitting this issue. Maybe I'm missing something super simple, but right now, I can't seem to get past this error.
Any help would be appreciated.
答案1
得分: 4
我已经找出了问题所在。我从另一个项目中引入了一个jar文件。该jar文件中的功能根本没有被使用,所以一开始并不怀疑它。不幸的是,那个项目没有进行更新,有一些旧版的Spark库被一些方式捎带到了我当前运行的应用程序中。一旦我移除了它,错误就消失了。有趣的是,依赖图中并没有显示任何关于另一个jar文件使用的库的信息。
我想,如果你遇到类似的问题,最好仔细检查导入的任何jar文件。
英文:
I was able to figure out the issue. I was pulling in jar file from another project. The functionality in the jar file wasn't being used at all, so it wasn't suspect. Unfortunately, that project hadn't been updated and there were some older Spark libraries that were some how being picked up by my current running app. Once I removed that, the error went away. What's interesting is the dependency graph didn't show anything about the libraries the other jar file was using.
I suppose if you run into a similar issue, double check any jar files being imported.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论