How to resolve an ' Unable to get public no-arg constructor' error while trying to push data to GCS and load it into BigQuery?

huangapple go评论58阅读模式
英文:

How to resolve an ' Unable to get public no-arg constructor' error while trying to push data to GCS and load it into BigQuery?

问题

我已经设置了一个PySpark会话,并根据我所阅读的内容提供了特定的配置设置:

self.spark_session = SparkSession.builder.appName(
            "示例会话"
        ).config("spark.jars", "../../.jars/spark-bigquery-with-dependencies_2.13-0.28.0.jar")\
            .config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")\
         .config("spark.driver.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar")\
         .config("spark.executor.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar").getOrCreate()

我能够正常处理我提取的数据集,对数据进行转换等等。问题出现在我尝试写入GCS以最终写入BigQuery时:

dataframe.write.format("bigquery").option("temporaryGcsBucket", bucket_path).save(table_name)

我收到的错误信息是:

py4j.protocol.Py4JJavaError: 调用 o65.json 时发生错误.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: com.google.cloud.spark.bigquery.BigQueryRelationProvider 无法获取公共无参数构造函数

这个错误表明在加载BigQuery数据源提供程序时出现问题。可能是缺少某个.jar文件,但不确定是哪一个。

英文:

I've set up a pyspark session and provided it specific configuration settings based off what I've read:

self.spark_session = SparkSession.builder.appName(
            "Example Session"
        ).config("spark.jars", "../../.jars/spark-bigquery-with-dependencies_2.13-0.28.0.jar")\
            .config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")\
         .config("spark.driver.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar")\
         .config("spark.executor.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar").getOrCreate()

and I'm able to work with the dataset I pull in just fine, transforming the data and the like. It's when I try to write to GCS to eventually write to BigQuery that I get an error:

dataframe.write.format("bigquery").option("temporaryGcsBucket", bucket_path).save(table_name)

The error I receive is:

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o65.json.
E                   : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: com.google.cloud.spark.bigquery.BigQueryRelationProvider Unable to get public no-arg constructor
E                   	at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
E                   	at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:673)
E                   	at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1233)
E                   	at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
E                   	at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
E                   	at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
E                   	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
E                   	at scala.collection.Iterator.foreach(Iterator.scala:943)
E                   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
E                   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
E                   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
E                   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
E                   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
E                   	at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
E                   	at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
E                   	at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
E                   	at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
E                   	at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
E                   	at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
E                   	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
E                   	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
E                   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
E                   	at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
E                   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E                   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
E                   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E                   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
E                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E                   	at py4j.Gateway.invoke(Gateway.java:282)
E                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
E                   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E                   	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E                   	at java.base/java.lang.Thread.run(Thread.java:829)
E                   Caused by: java.lang.NoClassDefFoundError: scala/$less$colon$less
E                   	at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
E                   	at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
E                   	at java.base/java.lang.Class.getConstructor0(Class.java:3342)
E                   	at java.base/java.lang.Class.getConstructor(Class.java:2151)
E                   	at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:660)
E                   	at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:657)
E                   	at java.base/java.security.AccessController.doPrivileged(Native Method)
E                   	at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:668)
E                   	... 33 more
E                   Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
E                   	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
E                   	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
E                   	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
E                   	... 41 more


I've looked everywhere but unsure how to resolve this. It looks like I'm missing another .jar file at best guess, but not sure which one.

答案1

得分: 0

问题是一个缺少的jar文件。在添加了正确的jar文件后(如这个SO问题中所提到的),它似乎正在工作。

英文:

The issue was a missing jar file. After adding in the correct jar (as mentioned in this SO question) it appears to be working.

huangapple
  • 本文由 发表于 2023年2月18日 01:16:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75487370.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定