英文:
Azure Data Factory: Use Lookup result in ForEach loop with Python
问题
我的流程图如下所示:
> 编辑:我想我知道为什么那里有那些 [ ] - 来自Python脚本
> 文件名作为文件名列表返回。然后Lookup
> 活动的结果如下:
{
"count": 1,
"value": [
{
"Prop_0": "['BAT.WISCDD1C.SC.DK002001.D2023191.CSV'",
"Prop_1": " 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV'",
"Prop_2": " 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV'",
"Prop_3": " 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV'",
"Prop_4": " 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV'",
"Prop_5": " 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV'",
"Prop_6": " 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV'",
"Prop_7": " 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV'",
"Prop_8": " 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV'",
"Prop_9": " 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV'",
"Prop_10": " 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']"
}
],
"effectiveIntegrationRuntime": "integrationRuntime1 (West Europe)",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "ManagedVNetIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
ForEach循环应该遍历Prop_元素,并逐个将它们传递给我的自定义批处理Python脚本。
在批处理自定义活动中,我将项目引用为@{item()},我猜我应该这样做。然而,当Python脚本接收到参数时,结果看起来像这样!
{Prop_0:[
'BAT.WISCDD1C.SC.DK002001.D2023191.CSV',
Prop_1: 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV',
Prop_2: 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV',
Prop_3: 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV',
Prop_4: 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV',
Prop_5: 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV',
Prop_6: 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV',
Prop_7: 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV',
Prop_8: 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV',
Prop_9: 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV',
Prop_10: 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']}
我不明白如何在值中使用Prop_!我希望在每次ForEach迭代中只获得一个CSV文件名!
我做错了什么?谢谢你的支持!
英文:
My pipeline looks like the following:
> Edit: I think I know why those [ ] in there - from the Python script
> the filenames are returned as a list of filenames. And the Lookup
> activity results then in the following output:
{
"count": 1,
"value": [
{
"Prop_0": "['BAT.WISCDD1C.SC.DK002001.D2023191.CSV'",
"Prop_1": " 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV'",
"Prop_2": " 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV'",
"Prop_3": " 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV'",
"Prop_4": " 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV'",
"Prop_5": " 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV'",
"Prop_6": " 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV'",
"Prop_7": " 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV'",
"Prop_8": " 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV'",
"Prop_9": " 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV'",
"Prop_10": " 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']"
}
],
"effectiveIntegrationRuntime": "integrationRuntime1 (West Europe)",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "ManagedVNetIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
The ForEach loop should walk through the Prop_ elements and pass them one by one to my Custom Batch Python script.
In the Batch Custom activity I refer to the items as @{item()} as I guess I should. However, this is how the result looks like when the Python script receives the parameter!
{Prop_0:[
'BAT.WISCDD1C.SC.DK002001.D2023191.CSV',
Prop_1: 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV',
Prop_2: 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV',
Prop_3: 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV',
Prop_4: 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV',
Prop_5: 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV',
Prop_6: 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV',
Prop_7: 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV',
Prop_8: 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV',
Prop_9: 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV',
Prop_10: 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']}
I don't understand how to use the Prop_ in the value! I would like to get only one CSV filename per ForEach iteration!
What am I doing wrong? Thank you for your support!
答案1
得分: 1
根据您的要求,我将为您翻译以下内容:
根据您的要求,每次 ForEach 迭代只获取一个 CSV 文件名,您可以按照以下方法进行操作。
在查找之后,通过以下表达式获取对象中 Prop
键的数量,并将其存储在一个字符串变量中。
@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))
现在,使用这个数字创建一个数字数组,如 [0,1,2,..,10]
,并将其提供给 ForEach 活动。
@range(0,int(variables('num_props')))
在 ForEach 内部,使用以下表达式从查找活动的输出中存储文件名。
@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]
这将在当前迭代中给出文件名,但它将包含特殊字符,如 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']
。
如果您想要删除这些特殊字符,可以使用以下表达式将它们替换为空字符串。
@replace(replace(replace(replace(variables('filename1'), '[', ''), ']', ''), ' ', ''), '''', '')
现在,将这个变量 filename
传递给 ForEach 活动内部的自定义批处理 Python 脚本。这将在每次迭代中将文件名传递给您的脚本。
迭代中的文件名:
供您参考的管道 JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"dataset": {
"referenceName": "sourcecsv_ADLS",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "Get the number of Props",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "num_props",
"value": {
"value": "@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get the number of Props",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@range(0,int(variables('num_props')))",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "get current File name using Prop_n",
"type": "SetVariable",
"dependsOn": [],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "filename1",
"value": {
"value": "@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]",
"type": "Expression"
}
}
},
{
"name": "Replace extra spaces and char",
"type": "SetVariable",
"dependsOn": [
{
"activity": "get current File name using Prop_n",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "filename",
"value": {
"value": "@replace(replace(replace(replace(variables('filename1'), '[', ''), ']', ''), ' ', ''), '''', '')",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"num_props": {
"type": "String"
},
"filename1": {
"type": "String"
},
"filename": {
"type": "String"
}
},
"annotations": []
}
}
英文:
As your requirement is to get only one CSV filename per ForEach iteration, you can follow the below approach.
After lookup, get the number of Prop
keys in the object and storing in a string variable by using below expression.
@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))
Now, using this number, create an array of numbers like [0,1,2,..,10]
and give that to ForEach activity.
@range(0,int(variables('num_props')))
Inside ForEach, store the file name from the lookup activity output using the below expression.
@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]
This will give the file name in the current iteration, but it will have the special characters like this 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']
.
If you want to remove those, use the below expression which replaces them with empty string.
@replace(replace(replace(replace(variables('filename1'), '[', ''),']',''),' ',''),'''','')
Now, give this variable filename
to your Custom Batch Python script inside ForEach activity. This will pass the filename every iteration to your script.
My Filename in an iteration:
My Pipeline JSON for your reference:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"dataset": {
"referenceName": "sourcecsv_ADLS",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "Get the number of Props",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "num_props",
"value": {
"value": "@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get the number of Props",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@range(0,int(variables('num_props')))",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "get current File name using Prop_n",
"type": "SetVariable",
"dependsOn": [],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "filename1",
"value": {
"value": "@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]",
"type": "Expression"
}
}
},
{
"name": "Replace extra spaces and char",
"type": "SetVariable",
"dependsOn": [
{
"activity": "get current File name using Prop_n",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "filename",
"value": {
"value": "@replace(replace(replace(replace(variables('filename1'), '[', ''),']',''),' ',''),'''','')",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"num_props": {
"type": "String"
},
"filename1": {
"type": "String"
},
"filename": {
"type": "String"
}
},
"annotations": []
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论