2020年8月11日 19:15:50go评论176阅读模式

英文:

Failed to connect to service endpoint when reading file from s3 using Spark and Java

问题

你需要从S3存储桶中读取文件并将其加载到Spark数据集中。你已经使用了正确的secretKey和accessKey，并尝试了端点配置，但出现了以下错误：

[main] WARN com.amazonaws.internal.InstanceMetadataServiceResourceFetcher - Fail to retrieve token
com.amazonaws.SdkClientException: Failed to connect to service endpoint:
    at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100)
    at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.getToken(InstanceMetadataServiceResourceFetcher.java:91)

... 74 more

java.nio.file.AccessDeniedException: datalakedbr: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Failed to connect to service endpoint:
    at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:187)
    at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
    at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
    at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
    at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Failed to connect to service endpoint:
    at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:159)

这是你使用的方法：

SparkSession sparkSession = SparkSession.builder()
    .master("local").appName("readFile")
    .config("fs.s3a.awsAccessKeyId", "key")
    .config("fs.s3a.awsSecretAccessKey", "secretKey")
    .getOrCreate();
JavaSparkContext sparkContext = new JavaSparkContext(sparkSession.sparkContext());
String path = "s3a://bucket/path.json";
Dataset<Row> file = sparkSession.sqlContext().read().load(path);

请问有人可以提供帮助吗？

英文:

I need to read a file from S3 bucket into a Spark dataSet. I'm used the correct secretKey and accessKey and I also tried with endpoint configuration but I get this Error :

com.amazonaws.SdkClientException: Failed to connect to service endpoint: 
 at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100)
 at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.getToken(InstanceMetadataServiceResourceFetcher.java:91)

 ... 74 more



java.nio.file.AccessDeniedException: datalakedbr: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Failed to connect to service endpoint: 

 at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:187)
 at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
 at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
 at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
 at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Failed to connect to service endpoint: 
 at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:159)

this is the method used :

    parkSession sparkSession = SparkSession.builder()
            .master(&quot;local&quot;).appName(&quot;readFile&quot;)
            .config(&quot;fs.s3a.awsAccessKeyId&quot;, &quot;key&quot;)
            .config(&quot;fs.s3a.awsSecretAccessKey&quot;, &quot;secretKey&quot;)
            .getOrCreate();
    JavaSparkContext sparkContext = new JavaSparkContext(sparkSession.sparkContext());
    String path = &quot;s3a://bucket/path.json&quot;;
    Dataset&lt;Row&gt; file = sparkSession.sqlContext().read().load(path);

Please anyone can help?

答案1

得分: 7

我认为问题出在属性的名称上。

请查看Hadoop文档：
https://hadoop.apache.org/docs/r2.7.2/hadoop-aws/tools/hadoop-aws/index.html

文档中指出，对于S3A，属性的名称应为fs.s3a.access.key / fs.s3a.secret.key，而不是 fs.s3a.awsAccessKeyId / fs.s3a.awsSecretAccessKey。

其他选项包括S3的fs.s3.awsAccessKeyId，或者S3N的fs.s3n.awsAccessKeyId。

英文:

I believe that the problem is with the name of the property.

Check the Hadoop documentation here:
https://hadoop.apache.org/docs/r2.7.2/hadoop-aws/tools/hadoop-aws/index.html

It says that for S3A, the name of the property should be fs.s3a.access.key / fs.s3a.secret.key, and not fs.s3a.awsAccessKeyId / fs.s3a.awsSecretAccessKey.

Other options are fs.s3.awsAccessKeyId for S3, or fs.s3n.awsAccessKeyId for S3N.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Failed to connect to service endpoint when reading file from s3 using Spark and Java.

问题

答案1

连接按钮与Android中的类实例

懒加载与规范不起作用

正确的代码是使用SQL来创建数据库表时，要求用户输入列名的部分。

Visual Studio Code对Java的语言支持崩溃了5次。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论