Airflow DAG未按计划运行,“下一次运行”是4小时前。

huangapple go评论49阅读模式
英文:

Airflow DAG does not run on schedule, "next run" is 4 hours ago

问题

I have an Airflow DAG (Airflow 2.6) where the code looks like this:

DAG(dag_id="my_dag",
     start_date=pendulum.datetime(2023, 5, 1, tz="America/Chicago"),
     schedule_interval="0 4 * * *", # 每天早上4点 Central 时间
     catchup=False) as dag:

注意,这是一个针对 America/Chicago 时区的时区感知型 DAG:https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timezone.html#time-zone-aware-dags

我们昨天晚上启动了这个 DAG,它立即开始运行其 2023-05-22 运行,我们允许了这种情况发生:
Airflow DAG未按计划运行,“下一次运行”是4小时前。

然后,我们期望在今天早上4点,即 2023-05-23,它会再次运行,但它从未运行过。

Airflow DAG未按计划运行,“下一次运行”是4小时前。

您可以看到数据间隔在今天早上4点结束,但 DAG 从未触发。它甚至显示下一次运行是5小时前。这显然对我来说似乎是一个 bug,但我希望听取第二意见,也许我不太了解 Airflow 中的调度。我在这里做错了什么吗?为什么它没有在4点运行?

谢谢!

编辑:在美国中部时间上午9点(也就是协调世界时下午14点)DAG 启动并运行。这对我来说更加令人困惑:我的调度是在 America/Chicago 时区的每天早上4点(0 4 * * *)。我不明白这是如何转换成美国中部时间的9点或协调世界时的2点。

英文:

I have an Airflow DAG (Airflow 2.6) where the code looks like this:

DAG(dag_id="my_dag",
         start_date=pendulum.datetime(2023, 5, 1, tz="America/Chicago"),
         schedule_interval="0 4 * * *", # Every day at 4AM Central
         catchup=False) as dag:

Note that this is a timezone-aware DAG for the America/Chicago timezone: https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timezone.html#time-zone-aware-dags

We turned this DAG on yesterday evening and it immediately began running for its 2023-05-22 run, which we allowed to happen:
Airflow DAG未按计划运行,“下一次运行”是4小时前。

Then we expect at 4AM today, 2023-05-23 it would run, but it never did.

Airflow DAG未按计划运行,“下一次运行”是4小时前。

You can see the data interval ended at 4am this morning but the DAG never triggered. it even says the next run is 5 hours ago. This plainly seems like a bug to me, but I was hoping for a second opinion, perhaps I don't understand scheduling in Airflow. Is there something I'm doing wrong here? Why did it not run at 4am?

Thank you!

EDIT: At exactly 9AM CDT(so 14:00 UTC) the DAG kicked off and ran. This seems even more confusing to me: my schedule is 0 4 * * * in the America/Chicago timezone. I don't understand how this translates to 9AM America/Chicago or 2PM UTC.

Airflow DAG未按计划运行,“下一次运行”是4小时前。

答案1

得分: 0

我已经为Airflow提出了一个GitHub问题: https://github.com/apache/airflow/issues/31487。原来这是在将MSSQL用作您的后备数据库服务器时的一个错误。Airflow只对MSSQL提供实验性支持,他们正在考虑完全删除它,因为没有人有时间维护它和开发支持。https://github.com/apache/airflow/issues/31487#issuecomment-1561974532。

如果您正在使用MSSQL作为后备数据库,并且遇到了这个错误,有一个临时的解决方法:https://github.com/apache/airflow/issues/21171#issuecomment-1363016567。您需要按照这个评论中提到的方式更改Airflow包中的两行代码。

我们决定,由于我们仍然处于概念验证阶段,并且由于未来的Airflow版本可能会取消MSSQL支持,我们将切换到Postgresql,因为它在Airflow社区中有最多的使用和支持。

英文:

I opened a GitHub issue for this for Airflow: https://github.com/apache/airflow/issues/31487. Turns out this is a bug when using MSSQL as your backing DB server. Airflow only has experimental support for MSSQL and they are considering removing it entirely since there's no one with the time to maintain it and develop the support. https://github.com/apache/airflow/issues/31487#issuecomment-1561974532.

There's a hacky fix if you're using MSSQL as your backing DB and you're experiencing this bug: https://github.com/apache/airflow/issues/21171#issuecomment-1363016567 You need to change two lines of code in the Airflow package in your Python environment as mentioned in this comment.

We decided that since we are still in the proof-of-concept phase and that since MSSQL support will potentially be pulled in future Airflow releases, we are going to switch to Postgresql which has the most usage and support from the Airflow community.

huangapple
  • 本文由 发表于 2023年5月24日 22:07:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76324440.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定