2023年4月17日 04:11:24go评论125阅读模式

英文:

Adf copy activity

问题

我有3个文件在目标位置，分别是a.txt，b.txt和c.txt。但是a.txt和c.txt已经存在于我的目标位置（adls），我的复制活动应该只复制剩下的b.txt文件。我如何使用adf管道活动实现这一点。

我尝试过对源和目标使用"获取元数据"来检查文件是否存在于目标位置。但在"if"活动内复制时，它会复制所有文件，而不是那些不存在的文件。

英文:

I have 3 files at destination a.txt, b.txt and c.txt. however a.txt and c.txt is already present in my destination (adls) and my copy activity should only copy the remaining b.txt file. How can i achieve this using adf pipeline activities.

I have tried using get metadata for both source and destination if files exist at destination or not. But while copying inside if activity it copies all the files. Not the ones which are not present.

答案1

得分: 1

使用两个并行的获取元数据活动来获取源和接收端的子项目列表。
在两个获取元数据活动都成功后，使用筛选活动，将源子项目数组作为项，条件为@not(contains(sink数组的子项目，item()))
然后利用 For Each 活动，以筛选活动的输出作为迭代输入，并复制缺失的文件。

英文:

use 2 get meta data activities in parallel to get the child items list of the source and sink
post success of both get meta data activity, use filter activity with
items as source child items array
and condition as
@not(contains(child items of sink array,item()))
then leverage For each activity with filter activity output as input for iteration and copy the missing files.

答案2

得分: 1

I agree with what @Nandan has suggested. The following is the demonstration of the same approach. The following is the output of the get metadata activity on source.

Copy活动

And my sink has files as shown in the image below:

Copy活动

Using filter, get the filenames that are not present in your sink and then use for each to copy these filtered files. The following dynamic content can be used as filter condition:

items: @activity('source file list').output.childItems
condition: @not(contains(activity('sink file list').output.childItems,item()))

Copy活动

Now, you can iterate through this filtered list using @activity('Filter1').output.Value as items for your for each activity. The following is the entire pipeline JSON for the above implementation:

{
"name": "pipeline2",
"properties": {
"activities": [
{
"name": "source file list",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText1",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "sink file list",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText2",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "source file list",
"dependencyConditions": [
"Succeeded"
]
},
{
"activity": "sink file list",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('source file list').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@not(contains(activity('sink file list').output.childItems,item()))",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Filter1').output.Value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "DelimitedText3",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "DelimitedText4",
"type": "DatasetReference"
}
]
}
]
}
}
],
"annotations": []
}
}

Since the end goal here would result in sink having all the files that source has, another way that you can consider is to use a delete activity inside your for each to delete the file in sink first and then copy it.
First get the list of files present in source using get metadata activity. Iterate through this list, first apply delete operation on the file and then copy.

NOTE: If the content of the file that already exist in both source and sink is different, go with the approach demonstrated above (as suggest by @Nandan).

英文:

I agree with what @Nandan has suggested. The following is the demonstration of the same approach. The following is the output of the get metadata activity on source.

Copy活动

And my sink has files as shown in the image below:

Copy活动

Using filter, get the filenames that are not present in your sink and then use for each to copy these filtered files. The following dynamic content can be used as filter condition:

items: @activity(&#39;source file list&#39;).output.childItems
condition: @not(contains(activity(&#39;sink file list&#39;).output.childItems,item()))

Copy活动

Now, you can iterate through this filtered list using @activity('Filter1').output.Value as items for your for each activity. The following is the entire pipeline JSON for the above implementation:

{
&quot;name&quot;: &quot;pipeline2&quot;,
&quot;properties&quot;: {
&quot;activities&quot;: [
{
&quot;name&quot;: &quot;source file list&quot;,
&quot;type&quot;: &quot;GetMetadata&quot;,
&quot;dependsOn&quot;: [],
&quot;policy&quot;: {
&quot;timeout&quot;: &quot;0.12:00:00&quot;,
&quot;retry&quot;: 0,
&quot;retryIntervalInSeconds&quot;: 30,
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;dataset&quot;: {
&quot;referenceName&quot;: &quot;DelimitedText1&quot;,
&quot;type&quot;: &quot;DatasetReference&quot;
},
&quot;fieldList&quot;: [
&quot;childItems&quot;
],
&quot;storeSettings&quot;: {
&quot;type&quot;: &quot;AzureBlobFSReadSettings&quot;,
&quot;enablePartitionDiscovery&quot;: false
},
&quot;formatSettings&quot;: {
&quot;type&quot;: &quot;DelimitedTextReadSettings&quot;
}
}
},
{
&quot;name&quot;: &quot;sink file list&quot;,
&quot;type&quot;: &quot;GetMetadata&quot;,
&quot;dependsOn&quot;: [],
&quot;policy&quot;: {
&quot;timeout&quot;: &quot;0.12:00:00&quot;,
&quot;retry&quot;: 0,
&quot;retryIntervalInSeconds&quot;: 30,
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;dataset&quot;: {
&quot;referenceName&quot;: &quot;DelimitedText2&quot;,
&quot;type&quot;: &quot;DatasetReference&quot;
},
&quot;fieldList&quot;: [
&quot;childItems&quot;
],
&quot;storeSettings&quot;: {
&quot;type&quot;: &quot;AzureBlobFSReadSettings&quot;,
&quot;enablePartitionDiscovery&quot;: false
},
&quot;formatSettings&quot;: {
&quot;type&quot;: &quot;DelimitedTextReadSettings&quot;
}
}
},
{
&quot;name&quot;: &quot;Filter1&quot;,
&quot;type&quot;: &quot;Filter&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;source file list&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
},
{
&quot;activity&quot;: &quot;sink file list&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;items&quot;: {
&quot;value&quot;: &quot;@activity(&#39;source file list&#39;).output.childItems&quot;,
&quot;type&quot;: &quot;Expression&quot;
},
&quot;condition&quot;: {
&quot;value&quot;: &quot;@not(contains(activity(&#39;sink file list&#39;).output.childItems,item()))&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
},
{
&quot;name&quot;: &quot;ForEach1&quot;,
&quot;type&quot;: &quot;ForEach&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;Filter1&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;items&quot;: {
&quot;value&quot;: &quot;@activity(&#39;Filter1&#39;).output.Value&quot;,
&quot;type&quot;: &quot;Expression&quot;
},
&quot;isSequential&quot;: true,
&quot;activities&quot;: [
{
&quot;name&quot;: &quot;Copy data1&quot;,
&quot;type&quot;: &quot;Copy&quot;,
&quot;dependsOn&quot;: [],
&quot;policy&quot;: {
&quot;timeout&quot;: &quot;0.12:00:00&quot;,
&quot;retry&quot;: 0,
&quot;retryIntervalInSeconds&quot;: 30,
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;source&quot;: {
&quot;type&quot;: &quot;DelimitedTextSource&quot;,
&quot;storeSettings&quot;: {
&quot;type&quot;: &quot;AzureBlobFSReadSettings&quot;,
&quot;recursive&quot;: true,
&quot;enablePartitionDiscovery&quot;: false
},
&quot;formatSettings&quot;: {
&quot;type&quot;: &quot;DelimitedTextReadSettings&quot;
}
},
&quot;sink&quot;: {
&quot;type&quot;: &quot;DelimitedTextSink&quot;,
&quot;storeSettings&quot;: {
&quot;type&quot;: &quot;AzureBlobFSWriteSettings&quot;
},
&quot;formatSettings&quot;: {
&quot;type&quot;: &quot;DelimitedTextWriteSettings&quot;,
&quot;quoteAllText&quot;: true,
&quot;fileExtension&quot;: &quot;.txt&quot;
}
},
&quot;enableStaging&quot;: false,
&quot;translator&quot;: {
&quot;type&quot;: &quot;TabularTranslator&quot;,
&quot;typeConversion&quot;: true,
&quot;typeConversionSettings&quot;: {
&quot;allowDataTruncation&quot;: true,
&quot;treatBooleanAsNumber&quot;: false
}
}
},
&quot;inputs&quot;: [
{
&quot;referenceName&quot;: &quot;DelimitedText3&quot;,
&quot;type&quot;: &quot;DatasetReference&quot;
}
],
&quot;outputs&quot;: [
{
&quot;referenceName&quot;: &quot;DelimitedText4&quot;,
&quot;type&quot;: &quot;DatasetReference&quot;
}
]
}
]
}
}
],
&quot;annotations&quot;: []
}
}

Since the end goal here would result in sink having all the files that source has, another way that you can consider is to use a delete activity inside your for each to delete the file in sink first and then copy it.
First get the list of files present in source using get metadata activity. Iterate through this list, first apply delete operation on the file and then copy.

NOTE: If the content of the file that already exist in both source and sink is different, go with the approach demonstrated above (as suggest by @Nandan).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Copy活动

问题

答案1

答案2

VBA 复制一行并插入到下一行，然后重复。

Create new databricks cluster from ADF linked service with InitScripts from abfss (azure blob )

验证/测试输出数据与原始数据的最佳方法

如何在这种情况下正确重写未定义的函数 each()？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论