Airflow Web服务器显示下一次运行为数据间隔的开始。

huangapple go评论58阅读模式
英文:

airflow webserver showing next run as start of data interval

问题

我有一个DAG,像这样:

@dag(
dag_id = "data-sync",
schedule_interval = '*/30 * * * *',
start_date=pendulum.datetime(2023, 3, 9, tz="Asia/Hong_Kong"),
catchup=False,
dagrun_timeout=timedelta(minutes=20),
)

所以它每30分钟运行一次,从今天开始在我的时区。不追溯……在Web服务器UI中,我有这些不同的字段:

Airflow Web服务器显示下一次运行为数据间隔的开始。

我发现这些字段中的下一次运行时间有点奇怪……我在21:01和21:29之间看着它……它仍然显示下一次运行是21:00,换句话说,下一次运行已经过去了……

下一次运行是指Airflow中的逻辑日期吗?也就是间隔的开始时间吗?看着它并看到一个过去的时间确实有点不直观。

英文:

I have a dag like that:

@dag(
dag_id = "data-sync",
schedule_interval = '*/30 * * * *',
start_date=pendulum.datetime(2023, 3, 9, tz="Asia/Hong_Kong"),
catchup=False,
dagrun_timeout=timedelta(minutes=20),
)

So it runs every 30 minutes , starting today in my timezone. No catchup....
In the webserver UI I have these different fields :

Airflow Web服务器显示下一次运行为数据间隔的开始。

What I find strange from these fields is the next run time... I was looking at it between 21:01 and 21:29 ... and it's still show the next run as 21:00 or in another words the next run is past...

Does the next run mean the logical date in airflow ? that is the start time of the interval ? it is quite non intuitive to look at it and see a time in the past...

答案1

得分: 1

你所看到的是逻辑日期。

> 看起来这是一个相当不直观的日期...

如果考虑数据管道流程,这是直观的。
让我们用每天运行的简单例子来解释。
在每天的间隔内,日期为2023-02-01的数据在2023-02-02准备好,意味着在2023-02-02 00:00,你就有了2023-02-01 00:00 - 2023-02-02 00:00的完整数据,因此只有在2023-02-02你才能开始运行2023-02-01的工作流。对于小时工作也是一样的。
通常你关心的是哪个日期的数据已经准备好,而不太关心它实际运行的时间戳。

如果你想知道进程何时运行,你可以在图形视图上悬停在下一次运行指示器上时看到:

Airflow Web服务器显示下一次运行为数据间隔的开始。

在这种情况下(使用你的代码示例),逻辑日期2023-03-09 14:30的运行将在2023-03-09 15:00开始,因为这是运行的30分钟间隔结束的时间,这将在7分钟内发生(注意当前时间是14:53 UTC,如黄色条中所示)。

英文:

What you see is the logical date.

> it is quite non intuitive to look at it and see a time in the past...

If you consider data pipeline flows it is intuitive.
Lets explain this with daily run for simplicity.
In daily interval the data of date 2023-02-01 is ready in 2023-02-02 meaning that in 2023-02-02 00:00 you have the full data of 2023-02-01 00:00 - 2023-02-02 00:00 thus only in 2023-02-02 you can start running the workflow of 2023-02-01. Same goes for hourly jobs.
Normally you care about what date is ready and less about the timestamp it actually run.

If you are looking to know when the process is going to run you have that in the Graph View when you hover over the Next Run indicator:

Airflow Web服务器显示下一次运行为数据间隔的开始。

In this case (using your code example) the run of logical date 2023-03-09 14:30 will start in 2023-03-09 15:00 as this is when the 30 minute interval of the run ends, this will happen in 7 minutes (Note that the current time is 14:53 UTC as shown in the yellow bar)

huangapple
  • 本文由 发表于 2023年3月9日 22:21:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75685860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定