Azure数据工厂:在Python的ForEach循环中使用Lookup结果

huangapple go评论78阅读模式
英文:

Azure Data Factory: Use Lookup result in ForEach loop with Python

问题

我的流程图如下所示:

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

> 编辑:我想我知道为什么那里有那些 [ ] - 来自Python脚本
> 文件名作为文件名列表返回。然后Lookup
> 活动的结果如下:

 {
                   "count": 1,
                   "value": [
                                     {
                                                        "Prop_0": "['BAT.WISCDD1C.SC.DK002001.D2023191.CSV'",
                                                        "Prop_1": " 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV'",
                                                        "Prop_2": " 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV'",
                                                        "Prop_3": " 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV'",
                                                        "Prop_4": " 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV'",
                                                        "Prop_5": " 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV'",
                                                        "Prop_6": " 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV'",
                                                        "Prop_7": " 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV'",
                                                        "Prop_8": " 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV'",
                                                        "Prop_9": " 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV'",
                                                        "Prop_10": " 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']"
                                     }
                   ],
                   "effectiveIntegrationRuntime": "integrationRuntime1 (West Europe)",
                   "billingReference": {
                                     "activityType": "PipelineActivity",
                                     "billableDuration": [
                                                        {
                                                                           "meterType": "ManagedVNetIR",
                                                                           "duration": 0.016666666666666666,
                                                                           "unit": "Hours"
                                                        }
                                     ]
                   },
                   "durationInQueue": {
                                     "integrationRuntimeQueue": 0
                   }
}

ForEach循环应该遍历Prop_元素,并逐个将它们传递给我的自定义批处理Python脚本。

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

在批处理自定义活动中,我将项目引用为@{item()},我猜我应该这样做。然而,当Python脚本接收到参数时,结果看起来像这样!

    {Prop_0:[
'BAT.WISCDD1C.SC.DK002001.D2023191.CSV',
    Prop_1: 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV',
    Prop_2: 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV',
    Prop_3: 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV',
    Prop_4: 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV',
    Prop_5: 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV',
    Prop_6: 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV',
    Prop_7: 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV',
    Prop_8: 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV',
    Prop_9: 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV',
    Prop_10: 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']}

我不明白如何在值中使用Prop_!我希望在每次ForEach迭代中只获得一个CSV文件名!

我做错了什么?谢谢你的支持!

英文:

My pipeline looks like the following:

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

> Edit: I think I know why those [ ] in there - from the Python script
> the filenames are returned as a list of filenames. And the Lookup
> activity results then in the following output:

 {
                   "count": 1,
                   "value": [
                                     {
                                                        "Prop_0": "['BAT.WISCDD1C.SC.DK002001.D2023191.CSV'",
                                                        "Prop_1": " 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV'",
                                                        "Prop_2": " 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV'",
                                                        "Prop_3": " 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV'",
                                                        "Prop_4": " 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV'",
                                                        "Prop_5": " 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV'",
                                                        "Prop_6": " 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV'",
                                                        "Prop_7": " 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV'",
                                                        "Prop_8": " 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV'",
                                                        "Prop_9": " 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV'",
                                                        "Prop_10": " 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']"
                                     }
                   ],
                   "effectiveIntegrationRuntime": "integrationRuntime1 (West Europe)",
                   "billingReference": {
                                     "activityType": "PipelineActivity",
                                     "billableDuration": [
                                                        {
                                                                           "meterType": "ManagedVNetIR",
                                                                           "duration": 0.016666666666666666,
                                                                           "unit": "Hours"
                                                        }
                                     ]
                   },
                   "durationInQueue": {
                                     "integrationRuntimeQueue": 0
                   }
}

The ForEach loop should walk through the Prop_ elements and pass them one by one to my Custom Batch Python script.

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

In the Batch Custom activity I refer to the items as @{item()} as I guess I should. However, this is how the result looks like when the Python script receives the parameter!

    {Prop_0:[
'BAT.WISCDD1C.SC.DK002001.D2023191.CSV',
    Prop_1: 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV',
    Prop_2: 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV',
    Prop_3: 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV',
    Prop_4: 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV',
    Prop_5: 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV',
    Prop_6: 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV',
    Prop_7: 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV',
    Prop_8: 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV',
    Prop_9: 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV',
    Prop_10: 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']}

I don't understand how to use the Prop_ in the value! I would like to get only one CSV filename per ForEach iteration!

What am I doing wrong? Thank you for your support!

答案1

得分: 1

根据您的要求,我将为您翻译以下内容:

根据您的要求,每次 ForEach 迭代只获取一个 CSV 文件名,您可以按照以下方法进行操作。

在查找之后,通过以下表达式获取对象中 Prop 键的数量,并将其存储在一个字符串变量中。

@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

现在,使用这个数字创建一个数字数组,如 [0,1,2,..,10],并将其提供给 ForEach 活动。

@range(0,int(variables('num_props')))

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

在 ForEach 内部,使用以下表达式从查找活动的输出中存储文件名。

@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

这将在当前迭代中给出文件名,但它将包含特殊字符,如 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']

如果您想要删除这些特殊字符,可以使用以下表达式将它们替换为空字符串。

@replace(replace(replace(replace(variables('filename1'), '[', ''), ']', ''), ' ', ''), '''', '')

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

现在,将这个变量 filename 传递给 ForEach 活动内部的自定义批处理 Python 脚本。这将在每次迭代中将文件名传递给您的脚本。

迭代中的文件名:

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

供您参考的管道 JSON:

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "Lookup1",
                "type": "Lookup",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "recursive": true,
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "dataset": {
                        "referenceName": "sourcecsv_ADLS",
                        "type": "DatasetReference"
                    },
                    "firstRowOnly": false
                }
            },
            {
                "name": "Get the number of Props",
                "type": "SetVariable",
                "dependsOn": [
                    {
                        "activity": "Lookup1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "variableName": "num_props",
                    "value": {
                        "value": "@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))",
                        "type": "Expression"
                    }
                }
            },
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Get the number of Props",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@range(0,int(variables('num_props')))",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "get current File name using Prop_n",
                            "type": "SetVariable",
                            "dependsOn": [],
                            "policy": {
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "filename1",
                                "value": {
                                    "value": "@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]",
                                    "type": "Expression"
                                }
                            }
                        },
                        {
                            "name": "Replace extra spaces and char",
                            "type": "SetVariable",
                            "dependsOn": [
                                {
                                    "activity": "get current File name using Prop_n",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy": {
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "filename",
                                "value": {
                                    "value": "@replace(replace(replace(replace(variables('filename1'), '[', ''), ']', ''), ' ', ''), '''', '')",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "variables": {
            "num_props": {
                "type": "String"
            },
            "filename1": {
                "type": "String"
            },
            "filename": {
                "type": "String"
            }
        },
        "annotations": []
    }
}
英文:

As your requirement is to get only one CSV filename per ForEach iteration, you can follow the below approach.

After lookup, get the number of Prop keys in the object and storing in a string variable by using below expression.

@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

Now, using this number, create an array of numbers like [0,1,2,..,10] and give that to ForEach activity.

@range(0,int(variables('num_props')))

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

Inside ForEach, store the file name from the lookup activity output using the below expression.

@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

This will give the file name in the current iteration, but it will have the special characters like this 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV'].

If you want to remove those, use the below expression which replaces them with empty string.

@replace(replace(replace(replace(variables('filename1'), '[', ''),']',''),' ',''),'''','')

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

Now, give this variable filename to your Custom Batch Python script inside ForEach activity. This will pass the filename every iteration to your script.

My Filename in an iteration:

Azure数据工厂:在Python的ForEach循环中使用Lookup结果

My Pipeline JSON for your reference:

{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"dataset": {
"referenceName": "sourcecsv_ADLS",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "Get the number of Props",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "num_props",
"value": {
"value": "@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get the number of Props",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@range(0,int(variables('num_props')))",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "get current File name using Prop_n",
"type": "SetVariable",
"dependsOn": [],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "filename1",
"value": {
"value": "@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]",
"type": "Expression"
}
}
},
{
"name": "Replace extra spaces and char",
"type": "SetVariable",
"dependsOn": [
{
"activity": "get current File name using Prop_n",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "filename",
"value": {
"value": "@replace(replace(replace(replace(variables('filename1'), '[', ''),']',''),' ',''),'''','')",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"num_props": {
"type": "String"
},
"filename1": {
"type": "String"
},
"filename": {
"type": "String"
}
},
"annotations": []
}
}

huangapple
  • 本文由 发表于 2023年8月9日 17:38:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76866435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定