验证/测试输出数据与原始数据的最佳方法

huangapple go评论75阅读模式
英文:

Beat methods to validate/test the output data with original data

问题

I have built dataflows from customer input data in Asure data factory. We designed the Workflows which are built on Alteryx(another Etl tool) in Azure data factory.

现在我已经在Azure数据工厂中从客户输入数据中构建了数据流。我们设计的工作流是基于Alteryx(另一个ETL工具)在Azure数据工厂中构建的。

Now testing the sample data is ok on both sides.

现在,双方都已经测试了样本数据,一切正常。

But how to validate the entire output of Alteryx and Azure to be matched/Validated. Is there any tool.

但如何验证Alteryx和Azure的整个输出是否匹配/验证过。是否有任何工具。

My output file format is CSV.

我的输出文件格式是CSV。

Is there any automation proceaa to validate all the rows of Alteryx outout and Azure Output so that i can be sure that i have built the right Dataflow logics.

是否有任何自动化过程可以验证Alteryx和Azure的所有输出行,以确保我构建了正确的数据流逻辑。

英文:

I have built dataflows from customer input data in Asure data factory. We designed the Workflows which are built on Alteryx(another Etl tool) in Azure data factory .

Now testing the sample data is ok on both sides.

But how to validate the entire output of Alteryx and Azure to be matched/Validated. Is there any tool.

My output file format is CSV.

Is there any automation proceaa to validate all the rows of Alteryx outout and Azure Output so that i can be sure that i have built the right Dataflow logics.

答案1

得分: 0

你应该能够通过验证活动验证整个数据集,如果这是你想要的(确保数据集符合某些标准等):

https://learn.microsoft.com/en-us/azure/data-factory/control-flow-validation-activity

不支持直接在Alteryx输出上进行数据集验证,在这种情况下,你需要将在Alteryx中转换的数据集存储在某个地方,比如Blob存储,然后可以在那里进行验证。验证活动通常支持文件存储系统,如数据湖、(S)FTP或关系数据库。

你也可以通过数据流来做,但你需要将数据存储在某个地方,因为ADF本身不能存储任何文件或数据。一旦你把两个数据集都存储好,你可以通过数据流进行自连接,创建哈希值并比较这些值,有一个视频是这样做的:

https://www.youtube.com/watch?v=GACpvMjOJgE

这里还有另一篇与你类似的帖子:

https://learn.microsoft.com/en-us/answers/questions/856074/how-can-i-compare-source-data-(sql-server)-with-de

如果以后你想继续进行这样的操作,你可以使用CDC。

英文:

You should be able to do verify the whole dataset through validate activity if thats what you want (that the dataset meets a certain criteria etc.):

https://learn.microsoft.com/en-us/azure/data-factory/control-flow-validation-activity

Dataset validation directly on Alteryx output is not supported, in which case you need to store the dataset somewhere that you just transformed in Alteryx like a blob store, then you can validate there. Validation activity usually supports file storage systems like datalake, (S)FTP, or a relational database.

You can also do this through dataflows but you need to store it somewhere, as ADF cannot store any files or data by itself. Once you have both datasets stored, you can for example do this through dataflows by self-joining the datasets, and creating hash values and comparing those values, there is a video like this:

https://www.youtube.com/watch?v=GACpvMjOJgE

And here is another post similar to yours:

https://learn.microsoft.com/en-us/answers/questions/856074/how-can-i-compare-source-data-(sql-server)-with-de

Later if you wanna do this going forward, you can use CDC.

huangapple
  • 本文由 发表于 2023年3月12日 14:28:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75711420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定