英文:
Airflow BashOperator return exit code 0 even when task failed and return exit code 1
问题
我正在尝试使用Airflow的Bash Operator和Kubernetes运行一个Spark作业,我已经将callback_failure配置为某个函数,但是尽管Spark作业以退出代码1失败,我的任务总是标记为成功,并且函数没有被调用(callback failure)。以下是Airflow日志的片段:
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO -
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO -
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Container name: spark-kubernetes-driver
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Container image: XXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/spark-py:XX_XX
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Container state: Terminated
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Exit code: 1
[2020-01-03 13:22:46,731] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO Client: Application run_report_generator finished.
[2020-01-03 13:22:46,736] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO ShutdownHookManager: Shutdown hook called
[2020-01-03 13:22:46,737] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-adb99a7e-ce6c-49f6-8307-a17c28448043
[2020-01-03 13:22:46,761] {{bash_operator.py:132}} INFO - Command exited with return code 0
[2020-01-03 13:22:49,994] {{logging_mixin.py:95}} INFO - [ [34m2020-01-03 13:22:49,994 [0m] {{ [34mlocal_task_job.py: [0m105}} INFO [0m - Task exited with return code 0
英文:
I am trying to run a spark job from airflow's bash operator with Kubernetes, I have configured callback_failure to some function, however even though spark job failed with exit code 1, my task is always marked as a success and function is not called( callbcak failure ). Following are snippets of airflow log:
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO -
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO -
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Container name: spark-kubernetes-driver
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Container image: XXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/spark-py:XX_XX
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Container state: Terminated
[2020-01-03 13:22:46,730] {{bash_operator.py:128}} INFO - Exit code: 1
[2020-01-03 13:22:46,731] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO Client: Application run_report_generator finished.
[2020-01-03 13:22:46,736] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO ShutdownHookManager: Shutdown hook called
[2020-01-03 13:22:46,737] {{bash_operator.py:128}} INFO - 20/01/03 13:22:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-adb99a7e-ce6c-49f6-8307-a17c28448043
[2020-01-03 13:22:46,761] {{bash_operator.py:132}} INFO - Command exited with return code 0
[2020-01-03 13:22:49,994] {{logging_mixin.py:95}} INFO - [ [34m2020-01-03 13:22:49,994 [0m] {{ [34mlocal_task_job.py: [0m105}} INFO [0m - Task exited with return code 0
答案1
得分: 1
你需要使用 set -e
来确保 BashOperator
在遇到任何非零返回代码时停止执行并返回错误。
英文:
You need to use set -e
to ensure the BashOperator
to stop execution and return error for any non-zero code.
答案2
得分: 0
你必须确保最后的退出代码不是0。
从你的输入中,你有这个:
[2020-01-03 13:22:46,761] {{bash_operator.py:132}} INFO - 命令以返回代码0退出
然后bash操作员将整个操作作为成功处理。
解决方案是将此退出代码明确设置为1。
例如,在Python中,你可以这样写:
import sys
if 条件_退出:
sys.exit(1)
英文:
You have to make sure that the last exit code is not 0 .
From your input you have this:
[2020-01-03 13:22:46,761] {{bash_operator.py:132}} INFO - Command exited with return code 0
and then the bash operator treats the whole operator job as success.
The solution is to make this exit code explicitly equal to 1.
For example in python you can have:
import sys
if condition_for_exiting:
sys.exit(1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论