问题

我正在使用Google Cloud上的Cloud Composer（Apache Airflow）。我们的一些流程需要比Composer默认节点池中可用资源更多，因此我已在集群中创建了一个额外的节点池。资源密集型的DAG使用KubernetesPodOperator，并通过affinity={ nodeAffinity...}属性明确地定位到特殊的节点池。

我的问题是自从创建了新的节点池以来，我注意到我的所有工作负载都被调度到这个新的节点池上。如何让我的常规工作负载继续在默认节点池上运行，同时将新的节点池保留给特殊用例？

以下是一个示例的KubernetesPodOperator定义，它定位到特殊节点池。常规的KubernetesPodOperator没有填写affinity属性：

KubernetesPodOperator(namespace='default',
            image="image_name",
            image_pull_policy='Always',
            name="example_name",
            task_id="example_name",
            get_logs=True,
            affinity={
                'nodeAffinity': {
                    'requiredDuringSchedulingIgnoredDuringExecution': {
                        'nodeSelectorTerms': [{
                            'matchExpressions': [{
                                'key': 'cloud.google.com/gke-nodepool',
                                'operator': 'In',
                                'values': ['datascience-pool']
                            }]
                        }]
                    }
                }
            },
            is_delete_operator_pod=True,
            dag=dag)

英文:

I am using Cloud Composer (Apache Airflow) on Google Cloud. Some of our processes require more resources than what's available on Composer's default node pool, so I've created an additional node pool within our cluster. The resource-intensive DAG's use the KubernetesPodOperator and specifically target the special node pool through the affinity={ nodeAffinity...} attribute.

My issue is that since creating the new node pool, I've noticed that ALL of my workloads are being scheduled on this new pool. How can I keep my normal workloads running on the default pool, while reserving the new node pool for special use cases?

Here is an example of KubernetesPodOperator definition that targets the special pool. The regular KubernetesPodOperator don't have the affinity attribute filled out:

KubernetesPodOperator(namespace=&#39;default&#39;,
            image=&quot;image_name&quot;,
            image_pull_policy=&#39;Always&#39;,
            name=&quot;example_name&quot;,
            task_id=&quot;example_name&quot;,
            get_logs=True,
            affinity={
                &#39;nodeAffinity&#39;: {
                    &#39;requiredDuringSchedulingIgnoredDuringExecution&#39;: {
                        &#39;nodeSelectorTerms&#39;: [{
                            &#39;matchExpressions&#39;: [{
                                &#39;key&#39;: &#39;cloud.google.com/gke-nodepool&#39;,
                                &#39;operator&#39;: &#39;In&#39;,
                                &#39;values&#39;: [&#39;datascience-pool&#39;]
                            }]
                        }]
                    }
                }
            },
            is_delete_operator_pod=True,
            dag=dag)

</details>


# 答案1
**得分**: 2

KubernetesPodOperator没有任何默认的亲和性偏好，所以您正常的工作负载最终被调度到新的节点池中的决策是由Kubernetes调度程序做出的。为了避免这种情况，您现在必须在所有的KubernetesPodOperator实例上设置亲和性（您可以通过使用`default_args`和`apply_defaults` Airflow装饰器来使这一过程相对不那么痛苦）。

至少在Cloud Composer的版本上，直到v1.8.3，Composer系统的Pod将始终在节点池`default-pool`中运行。因此，您可以使用这个来确保Pod在Composer节点池中运行，而不是自定义节点池。

<details>
<summary>英文:</summary>

The KubernetesPodOperator does not have any default affinity preferences, so the scheduling decision for your normal workloads to have ended up in the new node pool were made by the Kubernetes scheduler. To avoid this, you will now have to set affinity on all instances of KubernetesPodOperator (which you can make somewhat less painful by using `default_args` and the `apply_defaults` Airflow decorator).

At least as of versions of Cloud Composer up to v1.8.3, the Composer system pods will always run in the node pool `default-pool`. Therefore, you can use this to ensure the pods run in the Composer node pool instead of a custom one.

</details>



# 答案2
**得分**: 0

我不知道这是否是一个解决方法，但我已经通过为所有任务分配亲和性来解决了这个问题。需要高CPU或高内存的任务分配给相应的节点池，而默认任务分配给默认池。这解决了问题，我在许多流程中进行了测试。

<details>
<summary>英文:</summary>

I don&#39;t know if it is a work around but I have solved this issue by assigning affinity to all of the tasks. Tasks requiring high-cpu or high-memory are assigned to respective node pool and default tasks are assigned to default-pool. This resolves the issue, I have tested in many flows.

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将Cloud Composer DAG分配到特定的节点池运行？

问题

安装Apache Superset在EKS上时出现的问题

Is there a way to inspect the files inside a PersistentVolume from GKE by using Google Cloud Console?

如何跟踪配置映射中的更改？

terraform-kubernetes-provider如何创建类似于从文件创建的密钥。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论