英文:
Databricks Unity Catalog multiple metastore for same region
问题
我们有3个Databricks工作区,一个用于开发,一个用于测试,一个用于生产。所有这些工作区都位于相同的WestEurope区域。
我们所有的数据都在数据湖中,这意味着Databricks中的外部表引用了数据湖中的数据(Azure数据湖第2代)。
因此,每个工作区都与其关联不同的数据湖(因为它们用于不同的环境)。
现在,这并不适用于通常的Unity Catalog用例,其中您有多个工作区引用同一个元数据存储,例如我们会为每个环境设置不同的访问要求以及数据。在某些情况下,某些表可能存在于较低的环境中,但在生产环境中不存在。
此外,从这里我看到以下句子:
您可以为每个区域创建一个元数据存储,并将其附加到该区域中的任意数量的工作区。
我们所有的Databricks工作区(针对不同的环境)都位于同一个区域,但不同的订阅。
那么,Unity Catalog对于这种用例是否不适用呢?因为这将意味着我们为同一个区域创建3个不同的元数据存储。
如果不是这样,那么我们如何获得以下功能:
- Terraform功能,仅适用于Unity Catalog,例如创建架构。
- 数据血缘。
英文:
We have 3 databricks workspaces , one for dev, one for test and one for Production. All these workspaces are in the same region WestEurope.
All of our data is in the datalake, meaning external tables in databricks references the data in the data lake (Azure data lake gen 2).
Each of these workspaces thus have a different datalake associated with it (as they are for different environments).
Now, this does not cater for the usual Unity Catalog use case, where you have multiple workspaces referring to the same metastore, as e.g. we would have different access requirements for each environment, along with data. In some cases, certain tables may exist in lower environments and not in Prod.
Also, looking here, I see the following sentence
You can create one metastore per region and attach it to any number of workspaces in that region.
All our Databricks workspaces (for different environments) are in the same region , but different subscription.
Is it then that Unity Catalog, does not apply for this use case? Because that would mean, we create 3 different metastore for the same region.
If not, then how can we get goodies like
- Terraform capabilities which are only for unity catalog, e.g. create schema.
- Data Lineage
答案1
得分: 2
这是 Unity Catalog 的工作方式(至少目前是这样)- 每个地区可能只有一个 Unity Catalog 元数据存储库,该地区的所有工作区都可以附加到它。
目前,环境分离的问题可以通过用户组来解决。您可以设置 Azure 存储防火墙以限制对特定环境的工作区的访问。
而且,今年晚些时候将推出一项功能,允许仅将特定的目录附加到特定的工作区,因此您可以清楚地分离环境。这已在上一季度的产品路线图中提到,您可以参加 即将举行的产品路线图网络研讨会 以获取有关 Unity Catalog 的更多更新信息。
英文:
This is how Unity Catalog works (at least right now) - each region may have only one Unity Catalog Metastore and all workspaces in that region could be attached to it.
Right now the problem with environment separation could be solved with the user groups. And you can set Azure Storage Firewall to limit access from the workspaces specific to a given environment.
And later this year there will be a feature that will allow to attach specific catalogs only to specific workspaces, so you can clearly separate environments. It was mentioned in the last quarter product roadmap and you can attend upcoming product roadmap webinar to get more updates about Unity Catalog.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论