Got "java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.FileSourceOptions$" when spark-submit to Amazon EMR

huangapple go评论112阅读模式
英文:

Got "java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.FileSourceOptions$" when spark-submit to Amazon EMR

问题

你的Spark应用程序似乎在Amazon EMR集群上出现了错误,其中缺少了org/apache/spark/sql/catalyst/FileSourceOptions$类。这可能是由于依赖配置不正确或缺失所致。你需要确保在你的Spark应用程序中正确引入了这个类。

另外,请注意,你的代码示例中包含HTML转义字符("),你需要将它们替换为正常的双引号("),以确保配置文件和命令行参数正确解析。

如果你有更多的具体问题或需要进一步的帮助,请提出。

英文:

I have a Spark application. My build.sbt looks like

  1. name := "IngestFromS3ToKafka"
  2. version := "1.0"
  3. scalaVersion := "2.12.17"
  4. resolvers += "confluent" at "https://packages.confluent.io/maven/"
  5. val sparkVersion = "3.3.1"
  6. libraryDependencies ++= Seq(
  7. "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
  8. "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  9. "org.apache.hadoop" % "hadoop-common" % "3.3.5" % "provided",
  10. "org.apache.hadoop" % "hadoop-aws" % "3.3.5" % "provided",
  11. "com.amazonaws" % "aws-java-sdk-bundle" % "1.12.475" % "provided",
  12. "org.apache.spark" %% "spark-avro" % sparkVersion,
  13. "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
  14. "io.delta" %% "delta-core" % "2.4.0",
  15. "za.co.absa" %% "abris" % "6.3.0"
  16. )
  17. ThisBuild / assemblyMergeStrategy := {
  18. // https://stackoverflow.com/a/67937671/2000548
  19. case PathList("module-info.class") => MergeStrategy.discard
  20. case x if x.endsWith("/module-info.class") => MergeStrategy.discard
  21. // https://stackoverflow.com/a/76129963/2000548
  22. case PathList("org", "apache", "spark", "unused", "UnusedStubClass.class") => MergeStrategy.first
  23. case x =>
  24. val oldStrategy = (ThisBuild / assemblyMergeStrategy).value
  25. oldStrategy(x)
  26. }

I have a Amazon EMR cluster with Spark 3.3.1 inside.

When I spark-submit to Amazon EMR, I got error

  1. /bin/bash -c "/usr/bin/spark-submit --master yarn --deploy-mode client --class com.hongbomiao.IngestFromS3ToKafka --name ingest-from-s3-to-kafka /home/hadoop/IngestFromS3ToKafka-assembly-1.0.jar"
  2. 23/05/31 22:35:31 INFO SparkContext: Running Spark version 3.3.1-amzn-0
  3. # ...
  4. Exception in thread "main" com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/FileSourceOptions$
  5. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
  6. at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
  7. at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
  8. at org.apache.spark.sql.delta.DeltaLog$.getDeltaLogFromCache$1(DeltaLog.scala:801)
  9. at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:811)
  10. at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:715)
  11. at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:663)
  12. at org.apache.spark.sql.delta.DeltaLog$.$anonfun$forTableWithSnapshot$1(DeltaLog.scala:720)
  13. at org.apache.spark.sql.delta.DeltaLog$.withFreshSnapshot(DeltaLog.scala:753)
  14. at org.apache.spark.sql.delta.DeltaLog$.forTableWithSnapshot(DeltaLog.scala:720)
  15. at org.apache.spark.sql.delta.sources.DeltaDataSource.sourceSchema(DeltaDataSource.scala:91)
  16. at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:236)
  17. at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:118)
  18. at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:118)
  19. at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:34)
  20. at org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:168)
  21. at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:210)
  22. at com.hongbomiao.IngestFromS3ToKafka$.main(IngestFromS3ToKafka.scala:29)
  23. at com.hongbomiao.IngestFromS3ToKafka.main(IngestFromS3ToKafka.scala)
  24. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  25. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  26. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  27. at java.lang.reflect.Method.invoke(Method.java:498)
  28. at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
  29. at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006)
  30. at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
  31. at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
  32. at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
  33. at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095)
  34. at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104)
  35. at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  36. Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/FileSourceOptions$
  37. at org.apache.spark.sql.delta.DeltaLog.indexToRelation(DeltaLog.scala:181)
  38. at org.apache.spark.sql.delta.DeltaLog.loadIndex(DeltaLog.scala:196)
  39. at org.apache.spark.sql.delta.Snapshot.$anonfun$protocolAndMetadataReconstruction$1(Snapshot.scala:210)
  40. at scala.collection.immutable.List.map(List.scala:293)
  41. at org.apache.spark.sql.delta.Snapshot.protocolAndMetadataReconstruction(Snapshot.scala:210)
  42. at org.apache.spark.sql.delta.Snapshot.x$1$lzycompute(Snapshot.scala:134)
  43. at org.apache.spark.sql.delta.Snapshot.x$1(Snapshot.scala:129)
  44. at org.apache.spark.sql.delta.Snapshot._metadata$lzycompute(Snapshot.scala:129)
  45. at org.apache.spark.sql.delta.Snapshot._metadata(Snapshot.scala:129)
  46. at org.apache.spark.sql.delta.Snapshot.metadata(Snapshot.scala:197)
  47. at org.apache.spark.sql.delta.Snapshot.toString(Snapshot.scala:390)
  48. at java.lang.String.valueOf(String.java:2994)
  49. at java.lang.StringBuilder.append(StringBuilder.java:136)
  50. at org.apache.spark.sql.delta.Snapshot.$anonfun$new$1(Snapshot.scala:393)
  51. at org.apache.spark.sql.delta.Snapshot.$anonfun$logInfo$1(Snapshot.scala:370)
  52. at org.apache.spark.internal.Logging.logInfo(Logging.scala:61)
  53. at org.apache.spark.internal.Logging.logInfo$(Logging.scala:60)
  54. at org.apache.spark.sql.delta.Snapshot.logInfo(Snapshot.scala:370)
  55. at org.apache.spark.sql.delta.Snapshot.<init>(Snapshot.scala:393)
  56. at org.apache.spark.sql.delta.SnapshotManagement.$anonfun$createSnapshot$4(SnapshotManagement.scala:356)
  57. at org.apache.spark.sql.delta.SnapshotManagement.createSnapshotFromGivenOrEquivalentLogSegment(SnapshotManagement.scala:480)
  58. at org.apache.spark.sql.delta.SnapshotManagement.createSnapshotFromGivenOrEquivalentLogSegment$(SnapshotManagement.scala:468)
  59. at org.apache.spark.sql.delta.DeltaLog.createSnapshotFromGivenOrEquivalentLogSegment(DeltaLog.scala:74)
  60. at org.apache.spark.sql.delta.SnapshotManagement.createSnapshot(SnapshotManagement.scala:349)
  61. at org.apache.spark.sql.delta.SnapshotManagement.createSnapshot$(SnapshotManagement.scala:343)
  62. at org.apache.spark.sql.delta.DeltaLog.createSnapshot(DeltaLog.scala:74)
  63. at org.apache.spark.sql.delta.SnapshotManagement.$anonfun$createSnapshotAtInitInternal$1(SnapshotManagement.scala:304)
  64. at scala.Option.map(Option.scala:230)
  65. at org.apache.spark.sql.delta.SnapshotManagement.createSnapshotAtInitInternal(SnapshotManagement.scala:301)
  66. at org.apache.spark.sql.delta.SnapshotManagement.createSnapshotAtInitInternal$(SnapshotManagement.scala:298)
  67. at org.apache.spark.sql.delta.DeltaLog.createSnapshotAtInitInternal(DeltaLog.scala:74)
  68. at org.apache.spark.sql.delta.SnapshotManagement.$anonfun$getSnapshotAtInit$1(SnapshotManagement.scala:293)
  69. at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:140)
  70. at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:138)
  71. at org.apache.spark.sql.delta.DeltaLog.recordFrameProfile(DeltaLog.scala:74)
  72. at org.apache.spark.sql.delta.SnapshotManagement.getSnapshotAtInit(SnapshotManagement.scala:288)
  73. at org.apache.spark.sql.delta.SnapshotManagement.getSnapshotAtInit$(SnapshotManagement.scala:287)
  74. at org.apache.spark.sql.delta.DeltaLog.getSnapshotAtInit(DeltaLog.scala:74)
  75. at org.apache.spark.sql.delta.SnapshotManagement.$init$(SnapshotManagement.scala:57)
  76. at org.apache.spark.sql.delta.DeltaLog.<init>(DeltaLog.scala:80)
  77. at org.apache.spark.sql.delta.DeltaLog$.$anonfun$apply$4(DeltaLog.scala:790)
  78. at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
  79. at org.apache.spark.sql.delta.DeltaLog$.$anonfun$apply$3(DeltaLog.scala:785)
  80. at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:140)
  81. at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:138)
  82. at org.apache.spark.sql.delta.DeltaLog$.recordFrameProfile(DeltaLog.scala:595)
  83. at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:133)
  84. at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:128)
  85. at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:117)
  86. at org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:595)
  87. at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:132)
  88. at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:122)
  89. at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:112)
  90. at org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:595)
  91. at org.apache.spark.sql.delta.DeltaLog$.createDeltaLog$1(DeltaLog.scala:784)
  92. at org.apache.spark.sql.delta.DeltaLog$.$anonfun$apply$5(DeltaLog.scala:802)
  93. at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
  94. at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
  95. at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
  96. at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
  97. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
  98. ... 30 more
  99. Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.FileSourceOptions$
  100. at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
  101. at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  102. at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  103. ... 91 more

答案1

得分: 1

Initially I thought the issue is related with spark-sql library.

Later I found the real issue is because delta-core 2.4.0 is not compatible with Spark 3.3.1

Based on https://docs.delta.io/latest/releases.html#compatibility-with-apache-spark

Got "java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.FileSourceOptions$" when spark-submit to Amazon EMR

Once I change from

  1. "io.delta" %% "delta-core" % "2.4.0",

to

  1. "io.delta" %% "delta-core" % "2.3.0",

Recompile, and submit again. It works!

英文:

Initially I thought the issue is related with spark-sql library.

Later I found the real issue is because delta-core 2.4.0 is not compatible with Spark 3.3.1

Based on https://docs.delta.io/latest/releases.html#compatibility-with-apache-spark

Got "java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.FileSourceOptions$" when spark-submit to Amazon EMR

Once I change from

  1. "io.delta" %% "delta-core" % "2.4.0",

to

  1. "io.delta" %% "delta-core" % "2.3.0",

Recompile, and submit again. It works!

huangapple
  • 本文由 发表于 2023年6月1日 07:14:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377810.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定