英文:
Azure Databricks is unable to create an Event Grid Subscription for Autoloader Streams
问题
我正在尝试在Azure Databricks中创建自动加载流。
现在,当我尝试启动writeStream时,出现异常,显示如下:
com.databricks.sql.cloudfiles.errors.CloudFilesException: 无法创建事件网格订阅。请确保您的服务主体具有存储帐户 rahulstorageek 上的“写入”权限(例如,分配 Contributor 角色)以创建事件网格订阅
以下是我正在使用的代码:
# Spark 和 cloudFiles 配置
spark.conf.set("fs.azure.account.key.<My ADLS Gen2 Storage account name>.blob.core.windows.net",
"<我的存储帐户访问密钥2>")
queuesas = "<队列存储的 SAS 连接字符串>"
cloudfilesConf = {
"cloudFiles.subscriptionId": "<Azure 免费试用订阅 ID>",
"cloudFiles.connectionString": queuesas,
"cloudFiles.format" : "csv",
"cloudFiles.tenantId": "<服务主体的租户 ID>",
"cloudFiles.clientId": "<服务主体的客户端 ID>",
"cloudFiles.clientSecret": "<服务主体生成的客户端秘密值>",
"cloudFiles.resourceGroup" : "AzureDataBricks_Exploration_RG",
"cloudFiles.useNotifications": "true",
"cloudFiles.includeExistingFiles": "true",
"cloudFiles.validateOptions": "true",
}
# 创建传入数据的手动模式架构
from pyspark.sql.functions import *
from pyspark.sql.types import *
dataset_schema = StructType([
StructField("YearsExperience", DoubleType(), True),
StructField("Salary", IntegerType(), True)])
# Autoloader ReadStream
autoloader_df = (spark.readStream.format("cloudFiles")
.options(**cloudfilesConf)
.option("recursiveFileLookup","true")
.schema(dataset_schema)
.load("/mnt/autoloadersource/csv_files/")
)
# Autoloader Writestream
(autoloader_df.writeStream
.format("delta")
.option("mergeSchema", "true")
.option("checkpointLocation", "/mnt/autoloadersink/autostream_chckpnt")
.start("/mnt/autoloadersink/autoloader_dt01"))
## 在执行上述命令后引发异常。
我已经将以下角色授予了我正在使用的服务主体:
此外,我生成的用于队列的 SAS 令牌具有以下参数:
我已尝试为服务主体提供了以上截图中显示的所有附加角色,但仍然遇到相同的错误。
任何解决方案或建议将非常重要。
英文:
I am trying to create an autoloader stream in Azure Databricks.
Now when I am trying to start the writeStream, I am presented with exception saying:
>com.databricks.sql.cloudfiles.errors.CloudFilesException: Failed to create an Event Grid subscription. Please make sure that your service principal has 'write' permissions (e.g., assign it a Contributor role) on the storage account rahulstorageek in order to create Event Grid Subscriptions
Following is the code I am using:
# spark and cloudFiles configurations
spark.conf.set("fs.azure.account.key.<My ADLS Gen2 Storage account name>.blob.core.windows.net",
"<Access Key 2 of my Storage Account>")
queuesas = "<SAS Connection String for Queue Storage>"
cloudfilesConf = {
"cloudFiles.subscriptionId": "<Azure Free Trial Subscription Id>",
"cloudFiles.connectionString": queuesas,
"cloudFiles.format" : "csv",
"cloudFiles.tenantId": "<Service Principals tenant Id>",
"cloudFiles.clientId": "<Service Principals client Id>",
"cloudFiles.clientSecret": "<Service Principals generated client secret Value>",
"cloudFiles.resourceGroup" : "AzureDataBricks_Exploration_RG",
"cloudFiles.useNotifications": "true",
"cloudFiles.includeExistingFiles": "true",
"cloudFiles.validateOptions": "true",
}
# Creating manual schema of incoming data
from pyspark.sql.functions import *
from pyspark.sql.types import *
dataset_schema = StructType([
StructField("YearsExperience", DoubleType(), True),
StructField("Salary", IntegerType(), True)])
# Autoloader ReadStream
autoloader_df = (spark.readStream.format("cloudFiles")
.options(**cloudfilesConf)
.option("recursiveFileLookup","true")
.schema(dataset_schema)
.load("/mnt/autoloadersource/csv_files/")
)
# Autoloader Writestream
(autoloader_df.writeStream
.format("delta")
.option("mergeSchema", "true")
.option("checkpointLocation", "/mnt/autoloadersink/autostream_chckpnt")
.start("/mnt/autoloadersink/autoloader_dt01"))
## Exception is raised after executing this above command.
I have given following roles to the service principal I am using.
Additionally the SAS token I generated for Queue were with the following parameters:
I have tried giving all the additional roles to service principal you could see in the above screenshots but, still I am getting the same error.
Any solutions or suggestions would be highly valued.
答案1
得分: 1
您的权限不足,因为需要创建事件网格。文档明确指定了必要的角色:
-
在存储帐户上,您需要:
Contributor
- 用于在存储帐户中设置资源,如队列和事件订阅。Storage Queue Data Contributor
:用于执行队列操作,如从队列中检索和删除消息。 (如果使用DBR 8.1+并提供连接字符串,则可能不需要)
-
在资源组上:
EventGrid EventSubscription Contributor
:用于执行事件网格订阅操作。
英文:
Your permissions isn't enough because event grid needs to be created. The documentation clearly specify the necessary roles:
-
on storage account you need:
Contributor
- will be used setting up resources in your storage account, such as queues and event subscriptions.Storage Queue Data Contributor
: will be used to perform queue operations such as retrieving and deleting messages from the queues. (May not be required if you use DBR 8.1+ and provide connection string).
-
On resource group:
EventGrid EventSubscription Contributor
: will be used to perform event grid subscription operations.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论