英文:
How to resolve an ' Unable to get public no-arg constructor' error while trying to push data to GCS and load it into BigQuery?
问题
我已经设置了一个PySpark会话,并根据我所阅读的内容提供了特定的配置设置:
self.spark_session = SparkSession.builder.appName(
"示例会话"
).config("spark.jars", "../../.jars/spark-bigquery-with-dependencies_2.13-0.28.0.jar")\
.config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")\
.config("spark.driver.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar")\
.config("spark.executor.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar").getOrCreate()
我能够正常处理我提取的数据集,对数据进行转换等等。问题出现在我尝试写入GCS以最终写入BigQuery时:
dataframe.write.format("bigquery").option("temporaryGcsBucket", bucket_path).save(table_name)
我收到的错误信息是:
py4j.protocol.Py4JJavaError: 调用 o65.json 时发生错误.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: com.google.cloud.spark.bigquery.BigQueryRelationProvider 无法获取公共无参数构造函数
这个错误表明在加载BigQuery数据源提供程序时出现问题。可能是缺少某个.jar文件,但不确定是哪一个。
英文:
I've set up a pyspark session and provided it specific configuration settings based off what I've read:
self.spark_session = SparkSession.builder.appName(
"Example Session"
).config("spark.jars", "../../.jars/spark-bigquery-with-dependencies_2.13-0.28.0.jar")\
.config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")\
.config("spark.driver.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar")\
.config("spark.executor.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar").getOrCreate()
and I'm able to work with the dataset I pull in just fine, transforming the data and the like. It's when I try to write to GCS to eventually write to BigQuery that I get an error:
dataframe.write.format("bigquery").option("temporaryGcsBucket", bucket_path).save(table_name)
The error I receive is:
E py4j.protocol.Py4JJavaError: An error occurred while calling o65.json.
E : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: com.google.cloud.spark.bigquery.BigQueryRelationProvider Unable to get public no-arg constructor
E at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
E at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:673)
E at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1233)
E at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
E at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
E at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
E at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
E at scala.collection.Iterator.foreach(Iterator.scala:943)
E at scala.collection.Iterator.foreach$(Iterator.scala:943)
E at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
E at scala.collection.IterableLike.foreach(IterableLike.scala:74)
E at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
E at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
E at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
E at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
E at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
E at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
E at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
E at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
E at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
E at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
E at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
E at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E at java.base/java.lang.reflect.Method.invoke(Method.java:566)
E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E at py4j.Gateway.invoke(Gateway.java:282)
E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E at py4j.commands.CallCommand.execute(CallCommand.java:79)
E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E at java.base/java.lang.Thread.run(Thread.java:829)
E Caused by: java.lang.NoClassDefFoundError: scala/$less$colon$less
E at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
E at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
E at java.base/java.lang.Class.getConstructor0(Class.java:3342)
E at java.base/java.lang.Class.getConstructor(Class.java:2151)
E at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:660)
E at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:657)
E at java.base/java.security.AccessController.doPrivileged(Native Method)
E at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:668)
E ... 33 more
E Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
E at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
E at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
E at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
E ... 41 more
I've looked everywhere but unsure how to resolve this. It looks like I'm missing another .jar file at best guess, but not sure which one.
答案1
得分: 0
问题是一个缺少的jar文件。在添加了正确的jar文件后(如这个SO问题中所提到的),它似乎正在工作。
英文:
The issue was a missing jar file. After adding in the correct jar (as mentioned in this SO question) it appears to be working.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论