英文:
Azure Workload Identity with Spark on Kubernetes
问题
如何配置Spark以使用Azure Workload Identity访问AKS pods中的存储,而不必传递客户端密钥?
我能够成功传递这些属性并连接到ADLS Gen 2容器:
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
然而,我想利用工作负载身份,而无需传递任何密钥。我还尝试按照Hadoop的建议使用托管身份,但没有成功。
https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity
<property>
<name>fs.azure.account.auth.type</name>
<value>OAuth</value>
<description>
使用OAuth身份验证
</description>
</property>
<property>
<name>fs.azure.account.oauth.provider.type</name>
<value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
<description>
使用MSI发放OAuth令牌
</description>
</property>
<property>
<name>fs.azure.account.oauth2.msi.tenant</name>
<value></value>
<description>
可选的MSI租户ID
</description>
</property>
<property>
<name>fs.azure.account.oauth2.msi.endpoint</name>
<value></value>
<description>
MSI端点
</description>
</property>
<property>
<name>fs.azure.account.oauth2.client.id</name>
<value></value>
<description>
可选的客户端ID
</description>
</property>
当我们尝试上述属性时,我们收到以下HTML错误。
英文:
How to configure Spark to use Azure Workload Identity to access storage from AKS pods, rather than having to pass the client secret?
I am able to successfully pass these properties and connect to ADLS Gen 2 containers:
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
However, I would like to take advantage of workload identity and not have to pass any secret.
I've also tried following the recommendations from Hadoop to use managed identity but to no avail.
https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity
<property>
<name>fs.azure.account.auth.type</name>
<value>OAuth</value>
<description>
Use OAuth authentication
</description>
</property>
<property>
<name>fs.azure.account.oauth.provider.type</name>
<value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
<description>
Use MSI for issuing OAuth tokens
</description>
</property>
<property>
<name>fs.azure.account.oauth2.msi.tenant</name>
<value></value>
<description>
Optional MSI Tenant ID
</description>
</property>
<property>
<name>fs.azure.account.oauth2.msi.endpoint</name>
<value></value>
<description>
MSI endpoint
</description>
</property>
<property>
<name>fs.azure.account.oauth2.client.id</name>
<value></value>
<description>
Optional Client ID
</description>
</property>
When we've tried the above properties, we get back the below error with HTML.
答案1
得分: 1
请注意,以下是已翻译的内容:
从 AKS(使用托管标识)安全访问 Azure 资源以前是通过在 AKS 集群中集成 aad-pod-identity 来处理的。因此,您需要确保您的集群支持 AAD-pod-Identity,并相应地配置您的工作负载和 K8S 资源(pod、服务帐户等)。
然而,aad-pod-identity 在 2023 年初已被标记为弃用,由 Azure Workload Identity 替代。但是,Hadoop-azure 项目尚未支持 Workload Identity。不过,一个 Jira 任务正在等待处理:https://issues.apache.org/jira/browse/HADOOP-18610
英文:
Secure access to Azure resources from AKS ( using Managed Identities) was formaly handled by integration of aad-pod-idenity in your AKS cluster.
So you need to make sure your cluster supports AAD-pod-Identity, and configure your workload and K8S ressources (pods, services accounts, etc) accordingly.
See https://azure.github.io/aad-pod-identity/docs/demo/standard_walkthrough/
However, aad-pod-idenity has been marked deprecated early 2023 and replaced by Azure Workload Identity.
But support of Workload Identity is not yet done in Hadoop-azure project. A Jira is pending on the task though: https://issues.apache.org/jira/browse/HADOOP-18610
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论