英文:
Databricks, Storage Account and VNet peering
问题
我在Azure上有两个虚拟网络,我已部署以下内容进行测试:
- 在vnet1后面部署了一个存储帐户ADLS。
- 在vnet2后面部署了带有2个子网(公共和私有)的Databricks。
- vnet1和vnet2之间存在虚拟网络对等连接。对等连接状态已连接且同步。
- Databricks集群获得了一个位于vnet2范围内的IP地址。
当我尝试访问存储帐户的abfss路径并运行dbutils.fs.ls(abfss_url)
以列出内容时,我收到以下错误:
此请求未被授权执行此操作,403
当我明确将vnet2上的Databricks分配到存储帐户防火墙时,它可以工作。
**问题是:**在这里vnet对等连接是如何工作的?难道vnet对等连接不应该扩展实际网络,让我从Databricks集群中访问存储帐户而无需在存储帐户防火墙上分配vnet2吗?
英文:
I have two virtual networks on Azure I have deployed the following for testing purpose:
- behind vnet1 was deployed a Storage Account ADLS
- behind vnet2 was deployed Databricks with the 2 subnets for public and private
- between vnet1 and vnet 2 there is a vnet peering. The peering status is connected and sync
- the Databricks cluster is getting a NIC with an ip address in the range of vnet2
When I try to access the abfss path of the storage account and run dbutils.fs.ls(abfss_url) to list content I get the error:
> This request is not authorized to perform this operation.", 403
When I specifically assign the vnet2 Databricks on the storage account firewall, it works.
The question is: How does vnet peering works here? Shouldn't the vnet peering extend the actual network and let me access the storage account from the databricks cluster without to assign the vnet 2 on the storage account firewall?
答案1
得分: 1
存储不能直接部署在vnet上,因为它是一项PaaS服务。您可以使用服务终结点或分配给vnet1特定子网的专用终结点。由于您已在自定义vnet2上部署了Databricks,并且您有一个vnet1,当您将这两个vnet连接起来时,这意味着您可以考虑它们是一个大的vnet(vnet3),它是vnet1和vnet2的超集。但是,您仍然需要为Databricks设置服务终结点或专用终结点,以使其能够与存储进行通信,否则默认情况下,来自Databricks的请求将流向互联网,然后尝试重新进入vnet3,这就是阻止发生的地方。
您可以参考这篇文章,了解如何使用服务终结点或专用终结点。
英文:
The storage cannot be directly deployed on a vnet as it is a paas service. You would either be using service endpoint or private endpoint which is assigned to a specific subnet within vnet1.
Since you have deployed your databricks on a custom vnet2 and you have a vnet1, when you peer the 2 vnets, this means you can consider that you have 1 big vnet(vnet3) which is a superset of vnet1 and vnet2. But you still need to setup a service or private endpoint for your databricks to talk to your storage, else by default, the request from your databricks will go out to the internet and attempt to re-enter your vnet3 and this is where it is getting blocked.
You can refer to this article on how to use service or private endpoint.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论