英文:
Airflow with KubernetesExecutor slowly runs DAG tasks
问题
我们已经在KubernetesExecutor上成功运行Airflow。
但是在按计划启动新任务时需要很长时间。
这是在几个DAG必须同时启动,但很多任务处于“排队”状态时的常态:
看起来我们的工作节点不足,但我们的运维人员说每个DAG任务都有独立的工作节点。
在下面的图像中,我们可以看到总DAG执行时间为1:57:
但总任务执行时间加起来最多只有3秒。
有什么原因导致启动任务需要这么长时间呢?
我认为这可能与Kubernetes或其配置有关,但我没有证据,也没有能力证明这一点。
附言:我不是DevOps团队的成员,所以很遗憾我无法访问服务器或Kubernetes配置,但我可以按需提出请求。
附言:在以前使用LocalExecutor的服务器上,一切都按预期工作,没有延迟。
英文:
We have Airflow working with KubernetesExecutor.
And it takes a lot of time to just start a new task on schedule.
That's a usual state when a couple of dags must be started at same time, but a lot of tasks are in state "queued":
It looks like we have not enough workers, but our devops says that there is a separate worker for each dag task.
On the image below we can see that total DAG execution time is 1:57:
But each task execution time in sum is at maximum 3 seconds.
Any ideas why it takes so much time to just start a task?
I think that there is some problem with Kubernetes or its configuration, but have no proofs and have no competency to prove it.
P.S. I am not from DevOps team, so unfortunatelly I do not have access to server or kubernetes configuraion, but I can ask for it on demand.
P.P.S On previous server with LocalExecutor everything worked as expected, without delays.
答案1
得分: 1
Kubernetes执行器为每个任务启动一个新的Pod,其中包括Airflow工作进程的启动时间。在启动时间期间,任务将处于"queued"状态。根据您的Kubernetes设置和您尝试启动的工作Pod的性质,启动时间可能长达2分钟不等。要确定导致这段时间的原因,最好的方法是与您的运维团队一起在Kubernetes上观察它,它可能处于"Pending"状态,或者正在运行但正在进行Airflow的启动。
相比之下,LocalExecutor没有任何启动时间,因此在LocalExecutor上任务启动更快。
英文:
The Kubernetes executor spins up a new pod for each task, which includes the startup time for Airflow's worker process. During that startup time, the task will be in the "queued" state. Depending on your Kubernetes setup and the nature of the worker pod you are trying to spin up, start up times of 2 minutes are not unheard of. Your best bet to determine what's causing the time is to watch it on K8s with your ops team, it may be in the Pending state or it could be Running but going through the Airflow start up.
The LocalExecutor, by contrast has none of that start up time, so it makes sense that tasks start faster on the LocalExecutor.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论