如何在 Azure 数据工厂管道运行被取消时删除已摄取的数据?

huangapple go评论52阅读模式
英文:

How to delete ingested data when azure data factory pipeline run is cancelled?

问题

我想要实现的功能是,当管道运行被取消时,所有已复制到我的 psql 的数据都将被删除。但我不知道该如何做。我查找了删除活动,但它不支持 psql。我还查看了查找活动。查找活动似乎很有希望,但我不知道如何只在管道被取消时触发该活动。

英文:

Right now I have a Azure Data Factory pipeline that has 4 copy activities, all the copy activities are copying specific tables from a sql db to specific tables in psql. The pipeline is triggered with python code and I have a function that can cancel the pipeline run.

I want to implement a feature when the pipeline run is cancelled all the data that has been copied into my psql will deleted. But I have no idea how. I looked up the delete activity, but it does not support psql.

Any help will be appreciated. Thank you!

I have looked into the delete activity and lookup activity. The lookup activity seems promising, but I dont know how I can only trigger that activity when the pipeline is canceled.

答案1

得分: 0

当你取消一个管道运行时,同一管道内的其他活动将无法继续处理。因此,在取消管道后,无法删除已复制到接收器的数据,因为取消是一种突然停止。

由于ADF/Synapse管道以批次方式复制数据并不支持事务,对你来说最好的方法是将数据首先加载到暂存表中,然后再加载到与事务绑定的最终表中。

此外,如果你想在取消后删除最终表中的数据,你需要编写自己的自定义逻辑:

  1. 创建一个日志表,用于记录管道的最终状态,并将管道运行ID作为审计列复制到你的表中以记录已复制的记录。
  2. 将Log Analytics与ADF集成,在管道取消运行时,获取管道运行ID并与日志表进行验证,然后触发查找活动以删除具有该管道运行ID的行。
英文:

When you cancel a pipeline run, no other further activities within the same pipeline can be processed. So there is no way wherein you can delete the data that has been copied into sink once you cancel a pipeline as cancelling is an abrupt stop.
And since ADF/Synapse pipelines copy data in batches and do not support transaction, the best way for you would be initially load data into staging tables and then load them into final table bound with transaction.

Also, in case if you want to delete the data in final tables as well post cancellation, you would have to write your own custom logic wherein

  1. create a log table that would log the end status of the pipeline and copy the pipelinerunid as a audit column in your table for the records that are copied.
  2. Integrate Log analytics with ADF and in case if there is any pipeline cancellation run, get the pipelinerunid and validate it with log table and then trigger a lookup activity to delete the rows with the pipelinerunid

huangapple
  • 本文由 发表于 2023年5月11日 05:47:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222765.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定