GCS Hadoop connector error: ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer ls: No FileSystem for scheme gs

huangapple go评论78阅读模式
英文:

GCS Hadoop connector error: ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer ls: No FileSystem for scheme gs

问题

我正在尝试在我的本地Ubuntu 20.04上设置hadoop-connectors,并运行测试命令hadoop fs -ls gs://my-bucket,但我始终收到以下类似的错误:

$ hadoop fs -ls gs://my-bucket
2020-08-22 03:29:06,976 WARN fs.FileSystem: 无法加载文件系统:java.util.ServiceConfigurationError:org.apache.hadoop.fs.FileSystem:com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem 无法获取公共无参数构造函数
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.NoClassDefFoundError: com/google/api/client/http/HttpRequestInitializer
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer
ls:方案“gs”的文件系统不存在

请注意,我可以使用gsutil ls gs://my-bucket访问存储桶。

我已从此处下载了gcs-connector-hadoop3-latest.jar,并将其放置在/usr/local/hadoop/share/hadoop/common/lib中。我希望这是这个jar文件的正确位置?

我已根据此处列出的属性配置了core-site.xml,还将GOOGLE_APPLICATION_CREDENTIALS设置为我的服务帐户密钥文件。在hadoop-env.sh中,我已导出:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export HADOOP_CLASSPATH+="$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/lib/*.jar:$HADOOP_HOME/lib/*.jar"

不确定我是否已正确设置HADOOP_CLASSPATH,以及hadoop是否识别/usr/local/hadoop/share/hadoop/common/lib中的jar文件?与/usr/local/hadoop/lib相比有什么区别?

以下是core-site.xml的相关内容:

<configuration>
<property>
  <name>fs.AbstractFileSystem.gs.impl</name>
  <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
  <description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
  <name>fs.gs.project.id</name>
  <value>my-project-id</value>
  <description>
    可选。具有访问GCS存储桶权限的Google Cloud项目ID。
    仅在列出存储桶和创建存储桶操作时需要。
  </description>
</property>
<property>
  <name>google.cloud.auth.service.account.enable</name>
  <value>true</value>
  <description>
    是否使用服务帐户进行GCS授权。
    将此属性设置为“false”将禁用使用服务帐户进行身份验证。
  </description>
</property>
<property>
  <name>google.cloud.auth.service.account.json.keyfile</name>
  <value>/path/to/service-account.json</value>
  <description>
    当google.cloud.auth.service.account.enable为true时,用于GCS访问的服务帐户的JSON密钥文件。
  </description>
</property>
</configuration>
$ java --version
openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
源代码仓库 https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
编译者 brahma,时间 2020-07-06T18:44Z
使用 protoc 3.7.1 编译
来自具有校验和 5dc29b802d6ccd77b262ef9d04d19c4 的源代码
此命令是使用 /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar 运行的

bashrc:

...

export PDSH_RCMD_TYPE=ssh

export HADOOP_HOME="/usr/local/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
英文:

I am trying to setup hadoop-connectors on my local Ubuntu 20.04 and running the test command hadoop fs -ls gs://my-bucket but I keep getting errors like the following:

$ hadoop fs -ls gs://my-bucket
2020-08-22 03:29:06,976 WARN fs.FileSystem: Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem Unable to get public no-arg constructor
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.NoClassDefFoundError: com/google/api/client/http/HttpRequestInitializer
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer
ls: No FileSystem for scheme &quot;gs&quot;

Note that I can access the bucket using gsutil ls gs://my-bucket.

I have downloaded gcs-connector-hadoop3-latest.jar from here and placed it inside /usr/local/hadoop/share/hadoop/common/lib. I hope this is the right place for this jar file?

I've configured core-site.xml with the properties listed here and also set GOOGLE_APPLICATION_CREDENTIALS to my service account key file. In hadoop-env.sh I've exported

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ 
export HADOOP_CLASSPATH+=&quot;$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/lib/*.jar:$HADOOP_HOME/lib/*.jar&quot;

Not sure if I've set HADOOP_CLASSPATH correctly and if hadoop recognizes the jar files inside /usr/local/hadoop/share/hadoop/common/lib? And what is the difference to /usr/local/hadoop/lib?

Here is the relevant content of core-site.xml:

&lt;configuration&gt;
&lt;property&gt;
  &lt;name&gt;fs.AbstractFileSystem.gs.impl&lt;/name&gt;
  &lt;value&gt;com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS&lt;/value&gt;
  &lt;description&gt;The AbstractFileSystem for gs: uris.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;fs.gs.project.id&lt;/name&gt;
  &lt;value&gt;my-project-id&lt;/value&gt;
  &lt;description&gt;
    Optional. Google Cloud Project ID with access to GCS buckets.
    Required only for list buckets and create bucket operations.
  &lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;google.cloud.auth.service.account.enable&lt;/name&gt;
  &lt;value&gt;true&lt;/value&gt;
  &lt;description&gt;
    Whether to use a service account for GCS authorization.
    Setting this property to `false` will disable use of service accounts for
    authentication.
  &lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;google.cloud.auth.service.account.json.keyfile&lt;/name&gt;
  &lt;value&gt;/path/to/service-account.json&lt;/value&gt;
  &lt;description&gt;
    The JSON key file of the service account used for GCS
    access when google.cloud.auth.service.account.enable is true.
  &lt;/description&gt;
&lt;/property&gt;
&lt;/configuration&gt;
$ java --version
openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar

bashrc:

...

export PDSH_RCMD_TYPE=ssh

export HADOOP_HOME=&quot;/usr/local/hadoop&quot;
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}

答案1

得分: 1

似乎重新启动有助于解决问题。重新启动后,命令 hadoop fs -ls gs://my-bucket 可以正常运行,并按预期列出存储桶的内容。

感谢 @IgorDvorzhak 提供的命令:hadoop classpath --glob,用于检查是否可以找到 gcs-connector-hadoop3-latest.jar。我使用了:

hadoop classpath --glob | grep gcs-connector
英文:

It seems that rebooting helped to solve the issue. After a reboot the command hadoop fs -ls gs://my-bucket works and lists the content of the bucket as expected.

Thanks to @IgorDvorzhak providing the command: hadoop classpath --glob to check if the gcs-connector-hadoop3-latest.jar can be found. I used:

hadoop classpath --glob | grep gcs-connector

huangapple
  • 本文由 发表于 2020年8月22日 09:39:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/63531806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定