Azure Databricks 无法为 Autoloader Streams 创建 Event Grid 订阅。

huangapple go评论125阅读模式
英文:

Azure Databricks is unable to create an Event Grid Subscription for Autoloader Streams

问题

我正在尝试在Azure Databricks中创建自动加载流。
现在,当我尝试启动writeStream时,出现异常,显示如下:

com.databricks.sql.cloudfiles.errors.CloudFilesException: 无法创建事件网格订阅。请确保您的服务主体具有存储帐户 rahulstorageek 上的“写入”权限(例如,分配 Contributor 角色)以创建事件网格订阅

以下是我正在使用的代码:

# Spark 和 cloudFiles 配置
spark.conf.set("fs.azure.account.key.<My ADLS Gen2 Storage account name>.blob.core.windows.net",
               "<我的存储帐户访问密钥2>")
 
queuesas = "<队列存储的 SAS 连接字符串>"
 
cloudfilesConf = {
    "cloudFiles.subscriptionId": "<Azure 免费试用订阅 ID>",
    "cloudFiles.connectionString": queuesas,
    "cloudFiles.format" : "csv",
    "cloudFiles.tenantId": "<服务主体的租户 ID>",
    "cloudFiles.clientId": "<服务主体的客户端 ID>",
    "cloudFiles.clientSecret": "<服务主体生成的客户端秘密值>",
    "cloudFiles.resourceGroup" : "AzureDataBricks_Exploration_RG",
    "cloudFiles.useNotifications": "true",
    "cloudFiles.includeExistingFiles": "true",
    "cloudFiles.validateOptions": "true",
}

# 创建传入数据的手动模式架构
from pyspark.sql.functions import *
from pyspark.sql.types import *

dataset_schema = StructType([
    StructField("YearsExperience", DoubleType(), True),
    StructField("Salary", IntegerType(), True)])

# Autoloader ReadStream
autoloader_df = (spark.readStream.format("cloudFiles")
      .options(**cloudfilesConf)
      .option("recursiveFileLookup","true")
      .schema(dataset_schema)
      .load("/mnt/autoloadersource/csv_files/")
      )

# Autoloader Writestream
(autoloader_df.writeStream
 .format("delta")
 .option("mergeSchema", "true")
 .option("checkpointLocation", "/mnt/autoloadersink/autostream_chckpnt")
 .start("/mnt/autoloadersink/autoloader_dt01"))
## 在执行上述命令后引发异常。

我已经将以下角色授予了我正在使用的服务主体:

此外,我生成的用于队列的 SAS 令牌具有以下参数:

我已尝试为服务主体提供了以上截图中显示的所有附加角色,但仍然遇到相同的错误。

任何解决方案或建议将非常重要。

英文:

I am trying to create an autoloader stream in Azure Databricks.
Now when I am trying to start the writeStream, I am presented with exception saying:
>com.databricks.sql.cloudfiles.errors.CloudFilesException: Failed to create an Event Grid subscription. Please make sure that your service principal has &#39;write&#39; permissions (e.g., assign it a Contributor role) on the storage account rahulstorageek in order to create Event Grid Subscriptions

Azure Databricks 无法为 Autoloader Streams 创建 Event Grid 订阅。

Following is the code I am using:

# spark and cloudFiles configurations
spark.conf.set(&quot;fs.azure.account.key.&lt;My ADLS Gen2 Storage account name&gt;.blob.core.windows.net&quot;,
               &quot;&lt;Access Key 2 of my Storage Account&gt;&quot;)
 
queuesas = &quot;&lt;SAS Connection String for Queue Storage&gt;&quot;
 
cloudfilesConf = {
    &quot;cloudFiles.subscriptionId&quot;: &quot;&lt;Azure Free Trial Subscription Id&gt;&quot;,
    &quot;cloudFiles.connectionString&quot;: queuesas,
    &quot;cloudFiles.format&quot; : &quot;csv&quot;,
    &quot;cloudFiles.tenantId&quot;: &quot;&lt;Service Principals tenant Id&gt;&quot;,
    &quot;cloudFiles.clientId&quot;: &quot;&lt;Service Principals client Id&gt;&quot;,
    &quot;cloudFiles.clientSecret&quot;: &quot;&lt;Service Principals generated client secret Value&gt;&quot;,
    &quot;cloudFiles.resourceGroup&quot; : &quot;AzureDataBricks_Exploration_RG&quot;,
    &quot;cloudFiles.useNotifications&quot;: &quot;true&quot;,
    &quot;cloudFiles.includeExistingFiles&quot;: &quot;true&quot;,
    &quot;cloudFiles.validateOptions&quot;: &quot;true&quot;,
}

# Creating manual schema of incoming data
from pyspark.sql.functions import *
from pyspark.sql.types import *
 
dataset_schema = StructType([
    StructField(&quot;YearsExperience&quot;, DoubleType(), True),
    StructField(&quot;Salary&quot;, IntegerType(), True)])


# Autoloader ReadStream
autoloader_df = (spark.readStream.format(&quot;cloudFiles&quot;)
      .options(**cloudfilesConf)
      .option(&quot;recursiveFileLookup&quot;,&quot;true&quot;)
      .schema(dataset_schema)
      .load(&quot;/mnt/autoloadersource/csv_files/&quot;)
      )

# Autoloader Writestream
(autoloader_df.writeStream
 .format(&quot;delta&quot;)
 .option(&quot;mergeSchema&quot;, &quot;true&quot;)
 .option(&quot;checkpointLocation&quot;, &quot;/mnt/autoloadersink/autostream_chckpnt&quot;)
 .start(&quot;/mnt/autoloadersink/autoloader_dt01&quot;))
## Exception is raised after executing this above command.

I have given following roles to the service principal I am using.
Azure Databricks 无法为 Autoloader Streams 创建 Event Grid 订阅。

Additionally the SAS token I generated for Queue were with the following parameters:
Azure Databricks 无法为 Autoloader Streams 创建 Event Grid 订阅。

I have tried giving all the additional roles to service principal you could see in the above screenshots but, still I am getting the same error.

Any solutions or suggestions would be highly valued.

答案1

得分: 1

您的权限不足,因为需要创建事件网格。文档明确指定了必要的角色:

  • 在存储帐户上,您需要:

    • Contributor - 用于在存储帐户中设置资源,如队列和事件订阅。
    • Storage Queue Data Contributor:用于执行队列操作,如从队列中检索和删除消息。 (如果使用DBR 8.1+并提供连接字符串,则可能不需要)
  • 在资源组上:

    • EventGrid EventSubscription Contributor:用于执行事件网格订阅操作。
英文:

Your permissions isn't enough because event grid needs to be created. The documentation clearly specify the necessary roles:

  • on storage account you need:

    • Contributor - will be used setting up resources in your storage account, such as queues and event subscriptions.
    • Storage Queue Data Contributor: will be used to perform queue operations such as retrieving and deleting messages from the queues. (May not be required if you use DBR 8.1+ and provide connection string).
  • On resource group:

    • EventGrid EventSubscription Contributor: will be used to perform event grid subscription operations.

huangapple
  • 本文由 发表于 2023年6月1日 15:02:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379411.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定