英文:
GCS Hadoop connector error: ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer ls: No FileSystem for scheme gs
问题
我正在尝试在我的本地Ubuntu 20.04上设置hadoop-connectors,并运行测试命令hadoop fs -ls gs://my-bucket
,但我始终收到以下类似的错误:
$ hadoop fs -ls gs://my-bucket
2020-08-22 03:29:06,976 WARN fs.FileSystem: 无法加载文件系统:java.util.ServiceConfigurationError:org.apache.hadoop.fs.FileSystem:com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem 无法获取公共无参数构造函数
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.NoClassDefFoundError: com/google/api/client/http/HttpRequestInitializer
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer
ls:方案“gs”的文件系统不存在
请注意,我可以使用gsutil ls gs://my-bucket
访问存储桶。
我已从此处下载了gcs-connector-hadoop3-latest.jar
,并将其放置在/usr/local/hadoop/share/hadoop/common/lib
中。我希望这是这个jar文件的正确位置?
我已根据此处列出的属性配置了core-site.xml
,还将GOOGLE_APPLICATION_CREDENTIALS
设置为我的服务帐户密钥文件。在hadoop-env.sh
中,我已导出:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export HADOOP_CLASSPATH+="$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/lib/*.jar:$HADOOP_HOME/lib/*.jar"
不确定我是否已正确设置HADOOP_CLASSPATH
,以及hadoop
是否识别/usr/local/hadoop/share/hadoop/common/lib
中的jar文件?与/usr/local/hadoop/lib
相比有什么区别?
以下是core-site.xml
的相关内容:
<configuration>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>my-project-id</value>
<description>
可选。具有访问GCS存储桶权限的Google Cloud项目ID。
仅在列出存储桶和创建存储桶操作时需要。
</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
是否使用服务帐户进行GCS授权。
将此属性设置为“false”将禁用使用服务帐户进行身份验证。
</description>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/path/to/service-account.json</value>
<description>
当google.cloud.auth.service.account.enable为true时,用于GCS访问的服务帐户的JSON密钥文件。
</description>
</property>
</configuration>
$ java --version
openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
源代码仓库 https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
编译者 brahma,时间 2020-07-06T18:44Z
使用 protoc 3.7.1 编译
来自具有校验和 5dc29b802d6ccd77b262ef9d04d19c4 的源代码
此命令是使用 /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar 运行的
bashrc:
...
export PDSH_RCMD_TYPE=ssh
export HADOOP_HOME="/usr/local/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
英文:
I am trying to setup hadoop-connectors on my local Ubuntu 20.04 and running the test command hadoop fs -ls gs://my-bucket
but I keep getting errors like the following:
$ hadoop fs -ls gs://my-bucket
2020-08-22 03:29:06,976 WARN fs.FileSystem: Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem Unable to get public no-arg constructor
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.NoClassDefFoundError: com/google/api/client/http/HttpRequestInitializer
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer
ls: No FileSystem for scheme "gs"
Note that I can access the bucket using gsutil ls gs://my-bucket
.
I have downloaded gcs-connector-hadoop3-latest.jar
from here and placed it inside /usr/local/hadoop/share/hadoop/common/lib
. I hope this is the right place for this jar file?
I've configured core-site.xml
with the properties listed here and also set GOOGLE_APPLICATION_CREDENTIALS
to my service account key file. In hadoop-env.sh
I've exported
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export HADOOP_CLASSPATH+="$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/lib/*.jar:$HADOOP_HOME/lib/*.jar"
Not sure if I've set HADOOP_CLASSPATH
correctly and if hadoop
recognizes the jar files inside /usr/local/hadoop/share/hadoop/common/lib
? And what is the difference to /usr/local/hadoop/lib
?
Here is the relevant content of core-site.xml
:
<configuration>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>my-project-id</value>
<description>
Optional. Google Cloud Project ID with access to GCS buckets.
Required only for list buckets and create bucket operations.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
Whether to use a service account for GCS authorization.
Setting this property to `false` will disable use of service accounts for
authentication.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/path/to/service-account.json</value>
<description>
The JSON key file of the service account used for GCS
access when google.cloud.auth.service.account.enable is true.
</description>
</property>
</configuration>
$ java --version
openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
bashrc:
...
export PDSH_RCMD_TYPE=ssh
export HADOOP_HOME="/usr/local/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
答案1
得分: 1
似乎重新启动有助于解决问题。重新启动后,命令 hadoop fs -ls gs://my-bucket
可以正常运行,并按预期列出存储桶的内容。
感谢 @IgorDvorzhak 提供的命令:hadoop classpath --glob
,用于检查是否可以找到 gcs-connector-hadoop3-latest.jar
。我使用了:
hadoop classpath --glob | grep gcs-connector
英文:
It seems that rebooting helped to solve the issue. After a reboot the command hadoop fs -ls gs://my-bucket
works and lists the content of the bucket as expected.
Thanks to @IgorDvorzhak providing the command: hadoop classpath --glob
to check if the gcs-connector-hadoop3-latest.jar
can be found. I used:
hadoop classpath --glob | grep gcs-connector
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论