Azure Workload Identity with Spark on Kubernetes

huangapple go评论61阅读模式
英文:

Azure Workload Identity with Spark on Kubernetes

问题

如何配置Spark以使用Azure Workload Identity访问AKS pods中的存储,而不必传递客户端密钥?

我能够成功传递这些属性并连接到ADLS Gen 2容器:

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

然而,我想利用工作负载身份,而无需传递任何密钥。我还尝试按照Hadoop的建议使用托管身份,但没有成功。
https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity

<property>
  <name>fs.azure.account.auth.type</name>
  <value>OAuth</value>
  <description>
    使用OAuth身份验证
  </description>
</property>
<property>
  <name>fs.azure.account.oauth.provider.type</name>
  <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
  <description>
    使用MSI发放OAuth令牌
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.msi.tenant</name>
  <value></value>
  <description>
    可选的MSI租户ID
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.msi.endpoint</name>
  <value></value>
  <description>
    MSI端点
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.client.id</name>
  <value></value>
  <description>
    可选的客户端ID
  </description>
</property>

当我们尝试上述属性时,我们收到以下HTML错误。

使用托管身份属性的错误

英文:

How to configure Spark to use Azure Workload Identity to access storage from AKS pods, rather than having to pass the client secret?

I am able to successfully pass these properties and connect to ADLS Gen 2 containers:

spark.conf.set(&quot;fs.azure.account.auth.type.&lt;storage-account&gt;.dfs.core.windows.net&quot;, &quot;OAuth&quot;)
spark.conf.set(&quot;fs.azure.account.oauth.provider.type.&lt;storage-account&gt;.dfs.core.windows.net&quot;, &quot;org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider&quot;)
spark.conf.set(&quot;fs.azure.account.oauth2.client.id.&lt;storage-account&gt;.dfs.core.windows.net&quot;, &quot;&lt;application-id&gt;&quot;)
spark.conf.set(&quot;fs.azure.account.oauth2.client.secret.&lt;storage-account&gt;.dfs.core.windows.net&quot;, service_credential)
spark.conf.set(&quot;fs.azure.account.oauth2.client.endpoint.&lt;storage-account&gt;.dfs.core.windows.net&quot;, &quot;https://login.microsoftonline.com/&lt;directory-id&gt;/oauth2/token&quot;)

However, I would like to take advantage of workload identity and not have to pass any secret.
I've also tried following the recommendations from Hadoop to use managed identity but to no avail.
https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity

&lt;property&gt;
  &lt;name&gt;fs.azure.account.auth.type&lt;/name&gt;
  &lt;value&gt;OAuth&lt;/value&gt;
  &lt;description&gt;
  Use OAuth authentication
  &lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;fs.azure.account.oauth.provider.type&lt;/name&gt;
  &lt;value&gt;org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider&lt;/value&gt;
  &lt;description&gt;
  Use MSI for issuing OAuth tokens
  &lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;fs.azure.account.oauth2.msi.tenant&lt;/name&gt;
  &lt;value&gt;&lt;/value&gt;
  &lt;description&gt;
  Optional MSI Tenant ID
  &lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;fs.azure.account.oauth2.msi.endpoint&lt;/name&gt;
  &lt;value&gt;&lt;/value&gt;
  &lt;description&gt;
   MSI endpoint
  &lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
  &lt;name&gt;fs.azure.account.oauth2.client.id&lt;/name&gt;
  &lt;value&gt;&lt;/value&gt;
  &lt;description&gt;
  Optional Client ID
  &lt;/description&gt;
&lt;/property&gt;

When we've tried the above properties, we get back the below error with HTML.

Error from using managed identity properties

答案1

得分: 1

请注意,以下是已翻译的内容:

从 AKS(使用托管标识)安全访问 Azure 资源以前是通过在 AKS 集群中集成 aad-pod-identity 来处理的。因此,您需要确保您的集群支持 AAD-pod-Identity,并相应地配置您的工作负载和 K8S 资源(pod、服务帐户等)。

然而aad-pod-identity 在 2023 年初已被标记为弃用,由 Azure Workload Identity 替代。但是,Hadoop-azure 项目尚未支持 Workload Identity。不过,一个 Jira 任务正在等待处理:https://issues.apache.org/jira/browse/HADOOP-18610

英文:

Secure access to Azure resources from AKS ( using Managed Identities) was formaly handled by integration of aad-pod-idenity in your AKS cluster.
So you need to make sure your cluster supports AAD-pod-Identity, and configure your workload and K8S ressources (pods, services accounts, etc) accordingly.
See https://azure.github.io/aad-pod-identity/docs/demo/standard_walkthrough/

However, aad-pod-idenity has been marked deprecated early 2023 and replaced by Azure Workload Identity.
But support of Workload Identity is not yet done in Hadoop-azure project. A Jira is pending on the task though: https://issues.apache.org/jira/browse/HADOOP-18610

huangapple
  • 本文由 发表于 2023年4月11日 00:53:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979010.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定