Databricks、存储帐户和VNet对等连接

huangapple go评论103阅读模式
英文:

Databricks, Storage Account and VNet peering

问题

我在Azure上有两个虚拟网络,我已部署以下内容进行测试:

  • 在vnet1后面部署了一个存储帐户ADLS。
  • 在vnet2后面部署了带有2个子网(公共和私有)的Databricks。
  • vnet1和vnet2之间存在虚拟网络对等连接。对等连接状态已连接且同步。
  • Databricks集群获得了一个位于vnet2范围内的IP地址。

当我尝试访问存储帐户的abfss路径并运行dbutils.fs.ls(abfss_url)以列出内容时,我收到以下错误:

此请求未被授权执行此操作,403

当我明确将vnet2上的Databricks分配到存储帐户防火墙时,它可以工作。

**问题是:**在这里vnet对等连接是如何工作的?难道vnet对等连接不应该扩展实际网络,让我从Databricks集群中访问存储帐户而无需在存储帐户防火墙上分配vnet2吗?

英文:

I have two virtual networks on Azure I have deployed the following for testing purpose:

  • behind vnet1 was deployed a Storage Account ADLS
  • behind vnet2 was deployed Databricks with the 2 subnets for public and private
  • between vnet1 and vnet 2 there is a vnet peering. The peering status is connected and sync
  • the Databricks cluster is getting a NIC with an ip address in the range of vnet2

When I try to access the abfss path of the storage account and run dbutils.fs.ls(abfss_url) to list content I get the error:

> This request is not authorized to perform this operation.", 403

When I specifically assign the vnet2 Databricks on the storage account firewall, it works.

The question is: How does vnet peering works here? Shouldn't the vnet peering extend the actual network and let me access the storage account from the databricks cluster without to assign the vnet 2 on the storage account firewall?

答案1

得分: 1

存储不能直接部署在vnet上,因为它是一项PaaS服务。您可以使用服务终结点或分配给vnet1特定子网的专用终结点。由于您已在自定义vnet2上部署了Databricks,并且您有一个vnet1,当您将这两个vnet连接起来时,这意味着您可以考虑它们是一个大的vnet(vnet3),它是vnet1和vnet2的超集。但是,您仍然需要为Databricks设置服务终结点或专用终结点,以使其能够与存储进行通信,否则默认情况下,来自Databricks的请求将流向互联网,然后尝试重新进入vnet3,这就是阻止发生的地方。

您可以参考这篇文章,了解如何使用服务终结点或专用终结点。

英文:

The storage cannot be directly deployed on a vnet as it is a paas service. You would either be using service endpoint or private endpoint which is assigned to a specific subnet within vnet1.

Since you have deployed your databricks on a custom vnet2 and you have a vnet1, when you peer the 2 vnets, this means you can consider that you have 1 big vnet(vnet3) which is a superset of vnet1 and vnet2. But you still need to setup a service or private endpoint for your databricks to talk to your storage, else by default, the request from your databricks will go out to the internet and attempt to re-enter your vnet3 and this is where it is getting blocked.

You can refer to this article on how to use service or private endpoint.

huangapple
  • 本文由 发表于 2023年8月10日 16:34:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76873964.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定