2023年2月24日 08:46:42go评论89阅读模式

英文:

Error loading data from S3 bucket to Databricks External Table

问题

以下是代码的翻译部分：

使用我在网上找到的示例，下面的代码引发错误，因为它无法从S3存储桶中读取。问题在于我必须传递AWS凭据，这些凭据在变量S3_dir中找到，其中包含了存储桶的路径。我无法使这个工作。
%sql
DROP TABLE IF EXISTS mydb.bigtable;
CREATE TABLE mydb.bigtable
(
 id BIGINT,
  string1 STRING,
  numbers BIGINT,
) 使用 delta
LOCATION &#39;/user/hive/warehouse/mydb&#39;
SELECT * from delta.f&quot;{S3_dir}&quot;;

英文:

Using an example I found online, below code throws error as it cannot read from S3 bucket. Problem is I have to pass in the AWS credentials which is found in variable S3_dir with the bucket path. I am unable to get this to work.

%sql
DROP TABLE IF EXISTS mydb.bigtable;
CREATE TABLE mydb.bigtable
(
 id BIGINT,
  string1 STRING,
  numbers BIGINT,
) USING delta
LOCATION &#39;/user/hive/warehouse/mydb&#39;
SELECT * from delta.f&quot;{S3_dir}&quot;;

答案1

得分: 1

为了使用存储中的任何数据源（例如AWS S3、Azure ADLS），您需要在Databricks上挂载S3存储桶。以下是逐步的过程：

步骤1：为Databricks创建AWS访问密钥和秘密密钥

登录到您的AWS帐户
选择安全凭据 -> 在您的个人资料名称下 [右上角]
查找访问密钥（向下滚动） -> 创建并下载生成CSV文件

注意：此文件包含您的访问密钥和秘密密钥

步骤2：登录到Databricks并运行以下代码

创建一个名为“mount_s3_dbfs”的笔记本

import urllib
# 1. 添加必要信息
ACCESS_KEY = <access_key>
SECRET_KEY = <secret_key>
S3_BUCKET = <s3bucket_name>
# 2. 编码您的秘密密钥
ENCODED_SECRET_KEY = urllib.parse.quote(string=SECRET_KEY, safe="")
# 3. 定义URL
URL = "s3a://{}:{}@{}".format(ACCESS_KEY, ENCODED_SECRET_KEY, S3_BUCKET)
# 4. 使用DBFS挂载您的S3存储桶
dbutils.fs.mount(URL, f"/mnt/{S3_BUCKET}")
# 5. 打印挂载的文件夹内容
dbutils.fs.ls(URL, f"/mnt/{S3_BUCKET}/")

注意：不建议直接提供您的凭据，尝试将其上传到您的DBFS并使用Spark的读取CSV方法进行读取。

步骤3：使用Spark从AWS读取和写入文件

%sql
DROP TABLE IF EXISTS mydb.bigtable;
CREATE TABLE mydb.bigtable
(
 id BIGINT,
  string1 STRING,
  numbers BIGINT
) USING delta
LOCATION '/user/hive/warehouse/mydb'
SELECT * FROM delta.`path\to\s3file`;

注意：使用'`'（反引号）来定义您的S3文件位置。

英文:

In-order to use any data source from a storage(such as AWS S3, Azure ADLS), you need to mount the s3 bucket with Databricks. Here is the step by step procedure:

Step 1: Create AWS Access Key and Secret Key for Databricks

Log into your AWS account
Select security credentials -> under your profile name [top right
corner]
Look for Access keys (Scroll down) -> Create and download the
generate CSV file

Note: This file contains your access and secret key

Step 2: Login to Databricks and run the following code

Create a notebook called “mount_s3_dbfs”

import urllib
# 1. Add necessary information
ACCESS_KEY = &lt;access_key&gt;
SECRET_KEY = &lt;secret_key&gt;
S3_BUCKET = &lt;s3bucket_name&gt;
# 2. Encode your secret key
ENCODED_SECRET_KEY = urllib.parse.quote(string=SECRET_KEY, safe=&quot;&quot;)
# 3. Defining URL
URL = &quot;s3a://{}:{}@{}&quot;.format(ACCESS_KEY, ENCODED_SECRET_KEY, S3_BUCKET)
# 4. Mounting your S3 Bucket with DBFS
dbutils.fs.mount(URL,f&quot;/mnt/{S3_BUCKET}&quot;)
# 5. Print mounted folder content
dbutils.fs.ls(URL,f&quot;/mnt/{S3_BUCKET}/&quot;)

Note: Providing your credential directly is not recommended, try uploading it into your DBFS and read it using spark read csv method.

Step 3: Use spark to read and write file from AWS

%sql
DROP TABLE IF EXISTS mydb.bigtable;
CREATE TABLE mydb.bigtable
(
 id BIGINT,
  string1 STRING,
  numbers BIGINT,
) USING delta
LOCATION &#39;/user/hive/warehouse/mydb&#39;
SELECT * FROM delta.`path\to\s3file`;

Note: Use '`' (backtick) to define your s3 file location.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

错误从S3存储桶加载数据到Databricks外部表

问题

答案1

关于numpy matmul（’at’ / @ 运算符）的精度问题。

如何将返回值进行整理并保存为单独的JSON/CSV文件？

如何从fluentd配置文件中隐藏S3存储桶和访问密钥凭据

上传 mp3 使用 Discord 斜杠命令 [discord.py]

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。