2020年5月4日 23:47:17go评论178阅读模式

英文:

Databricks Notebook Scala Spark Connect to MongoDB Could not initialize class com.mongodb.spark.config.ReadConfig$

问题

我正在使用 Databricks 的 Scala 笔记本与 Spark 连接到 MongoDB，我试图理解为什么在尝试连接到 MongoDB 集群时会出现以下错误。我只是想从数据库中读取数据，但不确定为什么会一直出现这个错误。

java.lang.NoClassDefFoundError: Could not initialize class com.mongodb.spark.config.ReadConfig$

我尝试从 MongoDB 中读取数据的代码如下所示。

import org.apache.log4j.{Level, Logger}
import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.{ReadConfig, WriteConfig}
import com.mongodb.spark._
import com.mongodb.spark.config._

val data = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("database", "sample_airbnb").option("collection", "listingsAndReviews").load()
data.show()

我还在笔记本库中安装了以下库。

org.mongodb.spark:mongo-spark-connector_2.12:2.4.0
mongodb_driver_3_12_3_javadoc.jar
mongodb_driver_3_12_3_javadoc.jar
bson_3_12_3_javadoc.jar

以下是用于 Spark 配置的 URI。

spark.mongodb.input.uri mongodb+srv://<user>:<password>@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&w=majority
spark.mongodb.output.uri mongodb+srv://<user>:<password>@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&w=majority
spark.databricks.delta.preview.enabled true

非常感谢您的帮助！

英文:

I'm using a Databricks Scala notebook with Spark to connect to MongoDB and I'm trying to understand why I'm getting this error when I try to connect to my MongoDB cluster. I simply want to able to read my from database but I'm not sure why this error keeps coming up.

java.lang.NoClassDefFoundError: Could not initialize class com.mongodb.spark.config.ReadConfig$

My code where I'm attempting to read from MongoDB is shown here.

import org.apache.log4j.{Level, Logger}
import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.{ReadConfig, WriteConfig}
import com.mongodb.spark._
import com.mongodb.spark.config._

val data = spark.read.format(&quot;com.mongodb.spark.sql.DefaultSource&quot;).option(&quot;database&quot;, &quot;sample_airbnb&quot;).option(&quot;collection&quot;, &quot;listingsAndReviews&quot;).load()
data.show()

I've also installed the following libraries in my notebook library

org.mongodb.spark:mongo-spark-connector_2.12:2.4.0
mongodb_driver_3_12_3_javadoc.jar
mongodb_driver_3_12_3_javadoc.jar
bson_3_12_3_javadoc.jar

These are the uri used for the spark config

spark.mongodb.input.uri mongodb+srv://&lt;user&gt;:&lt;password&gt;@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&amp;w=majority
spark.mongodb.output.uri mongodb+srv://&lt;user&gt;:&lt;password&gt;@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&amp;w=majority
spark.databricks.delta.preview.enabled true

Any help is greatly appreciated!

答案1

得分: 1

我在Dataproc上使用pyspark遇到了相同的连接问题

我的解决方法：

安装这些Jars

https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector_2.11/2.4.0
https://repo1.maven.org/maven2/org/mongodb/bson/
https://repo1.maven.org/maven2/org/mongodb/mongodb-driver/
https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/

Pyspark：

from pyspark.sql import SparkSession

spark = SparkSession.builder\
                    .master('local')\
                    .config('spark.mongodb.input.uri', 'mongodb://{ Host }:{ Port }/{ DB }.{ Collection }')\
                    .config('spark.mongodb.output.uri', 'mongodb://{ Host }:{ Port }/{ DB }.{ Collection }')\
                    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11-2.4.0')\
                    .getOrCreate()

df = spark.read\
          .format("com.mongodb.spark.sql.DefaultSource")\
          .option("database",{ DB })\
          .option("collection", { Collection })\
          .load()

英文:

I have same connection problem on dataproc using pyspark

my solution:

Install these jars

https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector_2.11/2.4.0
https://repo1.maven.org/maven2/org/mongodb/bson/
https://repo1.maven.org/maven2/org/mongodb/mongodb-driver/
https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/

Pyspark:

from pyspark.sql import SparkSession

spark = SparkSession.builder\
                    .master(&#39;local&#39;)\
                    .config(&#39;spark.mongodb.input.uri&#39;, &#39;mongodb://{ Host }:{ Port }/{ DB }.{ Collection }&#39;)\
                    .config(&#39;spark.mongodb.output.uri&#39;, &#39;mongodb://{ Host }:{ Port }/{ DB }.{ Collection }&#39;)\
                    .config(&#39;spark.jars.packages&#39;, &#39;org.mongodb.spark:mongo-spark-connector_2.11-2.4.0&#39;)\
                    .getOrCreate()

df = spark.read\
          .format(&quot;com.mongodb.spark.sql.DefaultSource&quot;)\
          .option(&quot;database&quot;,{ DB })\
          .option(&quot;collection&quot;, { Collection })\
          .load()

答案2

得分: 0

可能涉及以下不同问题：

您正在使用针对Scala 2.12编译的连接器，但在使用Scala 2.11的Databricks运行时上运行 - 这很可能是问题，因为发布了使用Scala 2.12的DBR 7.0几乎晚了将近2个月。一个经验法则是：对于DBR < 7.0，使用带有名称中的_2.11的2.4.x版本的构件，对于DBR >= 7.0，使用_2.12和该库的3.0.0版本
您没有下载所有依赖项。连接器依赖许多其他库，这些库需要可用。最好将库指定为Maven坐标：org.mongodb.spark:mongo-spark-connector_2.11-2.4.0 - 这将拉取所有必要的依赖项

英文:

There could be different issues related to this:

You're using connector compiled with Scala 2.12 on Databricks runtime that uses Scala 2.11 - this is most probable issue, as DBR 7.0 that uses Scala 2.12 was released almost 2 months later. The rule of thumb - for DBR < 7.0, use artifact 2.4.x with _2.11 in name, for DBR >= 7.0, use _2.12 and version 3.0.0 of that library
You don't have all dependencies downloaded. Connector depends on many other libraries that need to be available. It's better to specify library as Maven coordinates: org.mongodb.spark:mongo-spark-connector_2.11-2.4.0 - this will pull all necessary dependencies

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Databricks Notebook Scala Spark Connect to MongoDB Could not initialize class com.mongodb.spark.config.ReadConfig$

问题

答案1

我在Dataproc上使用pyspark遇到了相同的连接问题

我的解决方法：

I have same connection problem on dataproc using pyspark

my solution:

答案2

iText字体设置为Phrase未反映

注解处理功能在升级到androidX后不起作用。

使用Spring依赖注入进行运行时生成的依赖项

咖啡因缓存 – 多个过期配置

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论