Databricks Notebook Scala Spark Connect to MongoDB Could not initialize class com.mongodb.spark.config.ReadConfig$

huangapple go评论116阅读模式
英文:

Databricks Notebook Scala Spark Connect to MongoDB Could not initialize class com.mongodb.spark.config.ReadConfig$

问题

我正在使用 Databricks 的 Scala 笔记本与 Spark 连接到 MongoDB,我试图理解为什么在尝试连接到 MongoDB 集群时会出现以下错误。我只是想从数据库中读取数据,但不确定为什么会一直出现这个错误。

  1. java.lang.NoClassDefFoundError: Could not initialize class com.mongodb.spark.config.ReadConfig$

我尝试从 MongoDB 中读取数据的代码如下所示。

  1. import org.apache.log4j.{Level, Logger}
  2. import org.apache.spark.ml.evaluation.RegressionEvaluator
  3. import org.apache.spark.ml.recommendation.ALS
  4. import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
  5. import org.apache.spark.sql.SQLContext
  6. import org.apache.spark.{SparkConf, SparkContext}
  7. import com.mongodb.spark.MongoSpark
  8. import com.mongodb.spark.config.{ReadConfig, WriteConfig}
  9. import com.mongodb.spark._
  10. import com.mongodb.spark.config._
  11. val data = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("database", "sample_airbnb").option("collection", "listingsAndReviews").load()
  12. data.show()

我还在笔记本库中安装了以下库。

  1. org.mongodb.spark:mongo-spark-connector_2.12:2.4.0
  2. mongodb_driver_3_12_3_javadoc.jar
  3. mongodb_driver_3_12_3_javadoc.jar
  4. bson_3_12_3_javadoc.jar

以下是用于 Spark 配置的 URI。

  1. spark.mongodb.input.uri mongodb+srv://<user>:<password>@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&w=majority
  2. spark.mongodb.output.uri mongodb+srv://<user>:<password>@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&w=majority
  3. spark.databricks.delta.preview.enabled true

非常感谢您的帮助!

英文:

I'm using a Databricks Scala notebook with Spark to connect to MongoDB and I'm trying to understand why I'm getting this error when I try to connect to my MongoDB cluster. I simply want to able to read my from database but I'm not sure why this error keeps coming up.

  1. java.lang.NoClassDefFoundError: Could not initialize class com.mongodb.spark.config.ReadConfig$

My code where I'm attempting to read from MongoDB is shown here.

  1. import org.apache.log4j.{Level, Logger}
  2. import org.apache.spark.ml.evaluation.RegressionEvaluator
  3. import org.apache.spark.ml.recommendation.ALS
  4. import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
  5. import org.apache.spark.sql.SQLContext
  6. import org.apache.spark.{SparkConf, SparkContext}
  7. import com.mongodb.spark.MongoSpark
  8. import com.mongodb.spark.config.{ReadConfig, WriteConfig}
  9. import com.mongodb.spark._
  10. import com.mongodb.spark.config._
  11. val data = spark.read.format(&quot;com.mongodb.spark.sql.DefaultSource&quot;).option(&quot;database&quot;, &quot;sample_airbnb&quot;).option(&quot;collection&quot;, &quot;listingsAndReviews&quot;).load()
  12. data.show()

I've also installed the following libraries in my notebook library

  1. org.mongodb.spark:mongo-spark-connector_2.12:2.4.0
  2. mongodb_driver_3_12_3_javadoc.jar
  3. mongodb_driver_3_12_3_javadoc.jar
  4. bson_3_12_3_javadoc.jar

These are the uri used for the spark config

  1. spark.mongodb.input.uri mongodb+srv://&lt;user&gt;:&lt;password&gt;@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&amp;w=majority
  2. spark.mongodb.output.uri mongodb+srv://&lt;user&gt;:&lt;password&gt;@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&amp;w=majority
  3. spark.databricks.delta.preview.enabled true

Any help is greatly appreciated!

答案1

得分: 1

我在Dataproc上使用pyspark遇到了相同的连接问题

我的解决方法:

安装这些Jars

  1. https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector_2.11/2.4.0
  2. https://repo1.maven.org/maven2/org/mongodb/bson/
  3. https://repo1.maven.org/maven2/org/mongodb/mongodb-driver/
  4. https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/

Pyspark:

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession.builder\
  3. .master('local')\
  4. .config('spark.mongodb.input.uri', 'mongodb://{ Host }:{ Port }/{ DB }.{ Collection }')\
  5. .config('spark.mongodb.output.uri', 'mongodb://{ Host }:{ Port }/{ DB }.{ Collection }')\
  6. .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11-2.4.0')\
  7. .getOrCreate()
  8. df = spark.read\
  9. .format("com.mongodb.spark.sql.DefaultSource")\
  10. .option("database",{ DB })\
  11. .option("collection", { Collection })\
  12. .load()
英文:

I have same connection problem on dataproc using pyspark

my solution:

Install these jars

  1. https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector_2.11/2.4.0
  2. https://repo1.maven.org/maven2/org/mongodb/bson/
  3. https://repo1.maven.org/maven2/org/mongodb/mongodb-driver/
  4. https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/

Pyspark:

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession.builder\
  3. .master(&#39;local&#39;)\
  4. .config(&#39;spark.mongodb.input.uri&#39;, &#39;mongodb://{ Host }:{ Port }/{ DB }.{ Collection }&#39;)\
  5. .config(&#39;spark.mongodb.output.uri&#39;, &#39;mongodb://{ Host }:{ Port }/{ DB }.{ Collection }&#39;)\
  6. .config(&#39;spark.jars.packages&#39;, &#39;org.mongodb.spark:mongo-spark-connector_2.11-2.4.0&#39;)\
  7. .getOrCreate()
  8. df = spark.read\
  9. .format(&quot;com.mongodb.spark.sql.DefaultSource&quot;)\
  10. .option(&quot;database&quot;,{ DB })\
  11. .option(&quot;collection&quot;, { Collection })\
  12. .load()

答案2

得分: 0

可能涉及以下不同问题:

  • 您正在使用针对Scala 2.12编译的连接器,但在使用Scala 2.11的Databricks运行时上运行 - 这很可能是问题,因为发布了使用Scala 2.12的DBR 7.0几乎晚了将近2个月。 一个经验法则是:对于DBR < 7.0,使用带有名称中的_2.11的2.4.x版本的构件,对于DBR >= 7.0,使用_2.12和该库的3.0.0版本
  • 您没有下载所有依赖项。 连接器依赖许多其他库,这些库需要可用。 最好将库指定为Maven坐标:org.mongodb.spark:mongo-spark-connector_2.11-2.4.0 - 这将拉取所有必要的依赖项
英文:

There could be different issues related to this:

  • You're using connector compiled with Scala 2.12 on Databricks runtime that uses Scala 2.11 - this is most probable issue, as DBR 7.0 that uses Scala 2.12 was released almost 2 months later. The rule of thumb - for DBR < 7.0, use artifact 2.4.x with _2.11 in name, for DBR >= 7.0, use _2.12 and version 3.0.0 of that library
  • You don't have all dependencies downloaded. Connector depends on many other libraries that need to be available. It's better to specify library as Maven coordinates: org.mongodb.spark:mongo-spark-connector_2.11-2.4.0 - this will pull all necessary dependencies

huangapple
  • 本文由 发表于 2020年5月4日 23:47:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/61596198.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定