2023年8月8日 22:26:45go评论144阅读模式

英文:

Iterate on files generated in single DataFlows for send to a API. - Azure Data Factory

问题

我的问题非常复杂，我觉得我没有一个好的方法，但你会告诉我。我相信你。;)
我在Azure Data Factory中。

我有一个生成多个文件的Dataflow。我使用“将文件名作为列数据”进行了分区。这些文件是csv格式的，存储在Azure存储中。

之后，在管道中，我将使用GetMetadata和参数“Child Items”获取数据集中的所有文件。这里的“Resource”是我在Dataflow中使用的数据集。

然后，在ForEach循环中，我获取数据集中的所有文件，也就是Azure存储中特定文件夹中的所有文件，并将其发送到一个API。这个API会通过传递文件名来处理文件，并将文件移动到存档文件夹中。

问题是，由于处理过程中可能出现错误，API可能不会将文件移动到存档文件夹中。如果这种情况发生，下一次管道将再次调用该文件，因为它始终位于数据集文件夹中。

我希望能更安全一些。我希望ForEach只处理由Dataflow刚刚创建的文件。

如果你有办法实现这个，我会很感激。:)

老实说，我不确定这样做是否是一个好方法。

我尝试将文件名传递给GetMetadata，并希望能够对文件名列表进行迭代处理。但我不认为发送一个列表是可能的。

英文:

My question is quite complicated and I think I don't have a good approach but you will tell me. I'm sure.
I'm in Azure Data Factory.

I have got a Dataflow that generates several files. I did partition with "Name file as column data". 在Azure Data Factory中，迭代单个DataFlows生成的文件以发送到API。
These files are csv and are stored in Azure Storage.

After that, in the pipeline I will get all files in the Dataset by using GetMetadata with argument "Child Items". "Resource" is the dataset that I used in Data flow too.
在Azure Data Factory中，迭代单个DataFlows生成的文件以发送到API。

After that in the ForEach, I get all files in the Dataset that mean all files in specific folder in Azure Storage and I send to an API. 在Azure Data Factory中，迭代单个DataFlows生成的文件以发送到API。
This API processes the file by passing fileName do things and move the files in Archive folder.

The problem is there is a possibility that API don't move file to the Archive folder because of there is a error in the process. If it happens next time the pipeline will call again the file because it's always in the Dataset folder.

I would like to be more secure. I would like the ForEach to only process the files just created by the Dataflow.

If you got idea to do that I will get it.

To be honest, I am not sure it's good approach like it's done.

I tried to pass filename to GetMetadata and I wanted to iterate on the list of filenames with that. But I don't think it's possible to send a list.

答案1

得分: 0

我可以通过在获取元数据活动中使用“按最后修改日期筛选”来实现您的要求，并感谢**@[Joel Cochran](https://stackoverflow.com/users/75838/joel-cochran "7,069 reputation")**提供的建议。

这些是管道运行之前的文件。