英文:
Databricks file trigger - how to whitlelist storage firewall
问题
最近,Databricks 添加了一个新功能 - 文件触发器。然而,这个功能似乎需要一个存储账户来允许所有网络流量。
我的存储账户已配置了防火墙,它拒绝来自未知来源的流量。Databricks Workspace 部署在我们的内部网络上 - 我们正在使用 Vnet 注入。所有必要的子网都在白名单中,通常存储正常工作,但文件触发器不起作用。
如果我关闭存储防火墙,文件触发器就能正常工作。外部位置和Azure Databricks 连接器已正确配置。
我收到的错误消息:
存储位置 abfss://<container>@<storage>.dfs.core.windows.net/ 的凭据无效。Unity 目录中的外部位置的凭据不能用于读取配置路径中的文件。请授予所需的权限。
如果我查看我的存储账户日志 - 看起来文件触发器会列出来自以 10.120.x.x 开头的私有 IP 地址的存储账户。如何将此服务加入白名单?我想保持存储处于防火墙下。
英文:
Recently, Databricks added a new feature - file trigger.
However, this functionality seems to need a storage account to allow all network traffic.
My storage account has a firewall configured, it denies traffic from unknown sources. Databricks Workspace is deployed to our internal network - we are using Vnet injection. All necessary subnets are whitelisted, generally, storage works fine, but not with a file trigger.
If I turn off the storage firewall, the file trigger works fine.
External location and Azure Databricks Connector are configured correctly.
The error I get:
> Invalid credentials for storage location abfss://<container>@<storage>.dfs.core.windows.net/. The credentials for the external location in the Unity Catalog cannot be used to read the files from the configured path. Please grant the required permissions.
If I look at the logs in my storage account - it looks like the file trigger lists the storage account from a private IP address starting from 10.120.x.x.
How do I whitelist this service? I want to keep my storage under the firewall.
答案1
得分: 2
Update 3rd April 2023rd: ADLS firewall isn't supported right now out of the box, work is in progress to solve this issue.
It's described in the documentation - you need:
- Create managed identity by creating the Databricks Access Connector
- Give this managed identity permission to access your storage account
- Create UC external location using the managed identity
- Give access to your storage account to given access connector - in "Networking", select "Resource instances", then select a Resource type of
Microsoft.Databricks/accessConnectors
and select your Azure Databricks access connector.
英文:
Update 3rd April 2023rd: ADLS firewall isn't supported right now out of the box, work is in progress to solve this issue.
It's described in the documentation - you need:
-
Create managed identity by creating the Databricks Access Connector
-
Give this managed identity permission to access your storage account
-
Create UC external location using the managed identity
-
Give access to your storage account to given access connector - in "Networking", select "Resource instances", then select a Resource type of
Microsoft.Databricks/accessConnectors
and select your Azure Databricks access connector.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论