2023年8月9日 17:38:11go评论84阅读模式

英文:

Azure Data Factory: Use Lookup result in ForEach loop with Python

问题

我的流程图如下所示：

> 编辑：我想我知道为什么那里有那些 [ ] - 来自Python脚本
> 文件名作为文件名列表返回。然后Lookup
> 活动的结果如下：

 {
                   &quot;count&quot;: 1,
                   &quot;value&quot;: [
                                     {
                                                        &quot;Prop_0&quot;: &quot;[&#39;BAT.WISCDD1C.SC.DK002001.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_1&quot;: &quot; &#39;BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_2&quot;: &quot; &#39;BAT.WISCDD4C.SC.DK002004.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_3&quot;: &quot; &#39;BAT.WISCDE1C.SC.DK003001.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_4&quot;: &quot; &#39;BAT.WISCDE3C.SC.DK003003.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_5&quot;: &quot; &#39;BAT.WISCDE4C.SC.DK003004.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_6&quot;: &quot; &#39;BAT.WISCDE5C.SC.DK003005.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_7&quot;: &quot; &#39;BAT.WISCDP1C.SC.DK001001.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_8&quot;: &quot; &#39;BAT.WISCDU1C.SC.DK004011.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_9&quot;: &quot; &#39;BAT.WISCDU4C.SC.DK004014.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_10&quot;: &quot; &#39;BAT.WISCDU7C.SC.DK004017.D2023191.CSV&#39;]&quot;
                                     }
                   ],
                   &quot;effectiveIntegrationRuntime&quot;: &quot;integrationRuntime1 (West Europe)&quot;,
                   &quot;billingReference&quot;: {
                                     &quot;activityType&quot;: &quot;PipelineActivity&quot;,
                                     &quot;billableDuration&quot;: [
                                                        {
                                                                           &quot;meterType&quot;: &quot;ManagedVNetIR&quot;,
                                                                           &quot;duration&quot;: 0.016666666666666666,
                                                                           &quot;unit&quot;: &quot;Hours&quot;
                                                        }
                                     ]
                   },
                   &quot;durationInQueue&quot;: {
                                     &quot;integrationRuntimeQueue&quot;: 0
                   }
}

ForEach循环应该遍历Prop_元素，并逐个将它们传递给我的自定义批处理Python脚本。

在批处理自定义活动中，我将项目引用为@{item()}，我猜我应该这样做。然而，当Python脚本接收到参数时，结果看起来像这样！

    {Prop_0:[
&#39;BAT.WISCDD1C.SC.DK002001.D2023191.CSV&#39;,
    Prop_1: &#39;BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV&#39;,
    Prop_2: &#39;BAT.WISCDD4C.SC.DK002004.D2023191.CSV&#39;,
    Prop_3: &#39;BAT.WISCDE1C.SC.DK003001.D2023191.CSV&#39;,
    Prop_4: &#39;BAT.WISCDE3C.SC.DK003003.D2023191.CSV&#39;,
    Prop_5: &#39;BAT.WISCDE4C.SC.DK003004.D2023191.CSV&#39;,
    Prop_6: &#39;BAT.WISCDE5C.SC.DK003005.D2023191.CSV&#39;,
    Prop_7: &#39;BAT.WISCDP1C.SC.DK001001.D2023191.CSV&#39;,
    Prop_8: &#39;BAT.WISCDU1C.SC.DK004011.D2023191.CSV&#39;,
    Prop_9: &#39;BAT.WISCDU4C.SC.DK004014.D2023191.CSV&#39;,
    Prop_10: &#39;BAT.WISCDU7C.SC.DK004017.D2023191.CSV&#39;]}

我不明白如何在值中使用Prop_！我希望在每次ForEach迭代中只获得一个CSV文件名！

我做错了什么？谢谢你的支持！

英文:

My pipeline looks like the following:

> Edit: I think I know why those [ ] in there - from the Python script
> the filenames are returned as a list of filenames. And the Lookup
> activity results then in the following output:

 {
                   &quot;count&quot;: 1,
                   &quot;value&quot;: [
                                     {
                                                        &quot;Prop_0&quot;: &quot;[&#39;BAT.WISCDD1C.SC.DK002001.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_1&quot;: &quot; &#39;BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_2&quot;: &quot; &#39;BAT.WISCDD4C.SC.DK002004.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_3&quot;: &quot; &#39;BAT.WISCDE1C.SC.DK003001.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_4&quot;: &quot; &#39;BAT.WISCDE3C.SC.DK003003.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_5&quot;: &quot; &#39;BAT.WISCDE4C.SC.DK003004.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_6&quot;: &quot; &#39;BAT.WISCDE5C.SC.DK003005.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_7&quot;: &quot; &#39;BAT.WISCDP1C.SC.DK001001.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_8&quot;: &quot; &#39;BAT.WISCDU1C.SC.DK004011.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_9&quot;: &quot; &#39;BAT.WISCDU4C.SC.DK004014.D2023191.CSV&#39;&quot;,
                                                        &quot;Prop_10&quot;: &quot; &#39;BAT.WISCDU7C.SC.DK004017.D2023191.CSV&#39;]&quot;
                                     }
                   ],
                   &quot;effectiveIntegrationRuntime&quot;: &quot;integrationRuntime1 (West Europe)&quot;,
                   &quot;billingReference&quot;: {
                                     &quot;activityType&quot;: &quot;PipelineActivity&quot;,
                                     &quot;billableDuration&quot;: [
                                                        {
                                                                           &quot;meterType&quot;: &quot;ManagedVNetIR&quot;,
                                                                           &quot;duration&quot;: 0.016666666666666666,
                                                                           &quot;unit&quot;: &quot;Hours&quot;
                                                        }
                                     ]
                   },
                   &quot;durationInQueue&quot;: {
                                     &quot;integrationRuntimeQueue&quot;: 0
                   }
}

The ForEach loop should walk through the Prop_ elements and pass them one by one to my Custom Batch Python script.

In the Batch Custom activity I refer to the items as @{item()} as I guess I should. However, this is how the result looks like when the Python script receives the parameter!

    {Prop_0:[
&#39;BAT.WISCDD1C.SC.DK002001.D2023191.CSV&#39;,
    Prop_1: &#39;BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV&#39;,
    Prop_2: &#39;BAT.WISCDD4C.SC.DK002004.D2023191.CSV&#39;,
    Prop_3: &#39;BAT.WISCDE1C.SC.DK003001.D2023191.CSV&#39;,
    Prop_4: &#39;BAT.WISCDE3C.SC.DK003003.D2023191.CSV&#39;,
    Prop_5: &#39;BAT.WISCDE4C.SC.DK003004.D2023191.CSV&#39;,
    Prop_6: &#39;BAT.WISCDE5C.SC.DK003005.D2023191.CSV&#39;,
    Prop_7: &#39;BAT.WISCDP1C.SC.DK001001.D2023191.CSV&#39;,
    Prop_8: &#39;BAT.WISCDU1C.SC.DK004011.D2023191.CSV&#39;,
    Prop_9: &#39;BAT.WISCDU4C.SC.DK004014.D2023191.CSV&#39;,
    Prop_10: &#39;BAT.WISCDU7C.SC.DK004017.D2023191.CSV&#39;]}

I don't understand how to use the Prop_ in the value! I would like to get only one CSV filename per ForEach iteration!

What am I doing wrong? Thank you for your support!

答案1

得分: 1

根据您的要求，我将为您翻译以下内容：

根据您的要求，每次 ForEach 迭代只获取一个 CSV 文件名，您可以按照以下方法进行操作。

在查找之后，通过以下表达式获取对象中 Prop 键的数量，并将其存储在一个字符串变量中。

@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

现在，使用这个数字创建一个数字数组，如 [0,1,2,..,10]，并将其提供给 ForEach 活动。

@range(0,int(variables('num_props')))

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

在 ForEach 内部，使用以下表达式从查找活动的输出中存储文件名。

@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

这将在当前迭代中给出文件名，但它将包含特殊字符，如 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']。

如果您想要删除这些特殊字符，可以使用以下表达式将它们替换为空字符串。

@replace(replace(replace(replace(variables('filename1'), '[', ''), ']', ''), ' ', ''), '''', '')

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

现在，将这个变量 filename 传递给 ForEach 活动内部的自定义批处理 Python 脚本。这将在每次迭代中将文件名传递给您的脚本。

迭代中的文件名：

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

供您参考的管道 JSON：

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "Lookup1",
                "type": "Lookup",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "recursive": true,
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "dataset": {
                        "referenceName": "sourcecsv_ADLS",
                        "type": "DatasetReference"
                    },
                    "firstRowOnly": false
                }
            },
            {
                "name": "Get the number of Props",
                "type": "SetVariable",
                "dependsOn": [
                    {
                        "activity": "Lookup1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "variableName": "num_props",
                    "value": {
                        "value": "@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))",
                        "type": "Expression"
                    }
                }
            },
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Get the number of Props",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@range(0,int(variables('num_props')))",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "get current File name using Prop_n",
                            "type": "SetVariable",
                            "dependsOn": [],
                            "policy": {
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "filename1",
                                "value": {
                                    "value": "@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]",
                                    "type": "Expression"
                                }
                            }
                        },
                        {
                            "name": "Replace extra spaces and char",
                            "type": "SetVariable",
                            "dependsOn": [
                                {
                                    "activity": "get current File name using Prop_n",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy": {
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "filename",
                                "value": {
                                    "value": "@replace(replace(replace(replace(variables('filename1'), '[', ''), ']', ''), ' ', ''), '''', '')",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "variables": {
            "num_props": {
                "type": "String"
            },
            "filename1": {
                "type": "String"
            },
            "filename": {
                "type": "String"
            }
        },
        "annotations": []
    }
}

英文:

As your requirement is to get only one CSV filename per ForEach iteration, you can follow the below approach.

After lookup, get the number of Prop keys in the object and storing in a string variable by using below expression.

@string(sub(length(split(string(activity(&#39;Lookup1&#39;).output.value),&#39;:&#39;)), 1))

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

Now, using this number, create an array of numbers like [0,1,2,..,10] and give that to ForEach activity.

@range(0,int(variables(&#39;num_props&#39;)))

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

Inside ForEach, store the file name from the lookup activity output using the below expression.

@activity(&#39;Lookup1&#39;).output.value[0][concat(&#39;Prop_&#39;,string(item()))]

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

This will give the file name in the current iteration, but it will have the special characters like this 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV'].

If you want to remove those, use the below expression which replaces them with empty string.

@replace(replace(replace(replace(variables(&#39;filename1&#39;), &#39;[&#39;, &#39;&#39;),&#39;]&#39;,&#39;&#39;),&#39; &#39;,&#39;&#39;),&#39;&#39;&#39;&#39;,&#39;&#39;)

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

Now, give this variable filename to your Custom Batch Python script inside ForEach activity. This will pass the filename every iteration to your script.

My Filename in an iteration:

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

My Pipeline JSON for your reference:

{
&quot;name&quot;: &quot;pipeline1&quot;,
&quot;properties&quot;: {
&quot;activities&quot;: [
{
&quot;name&quot;: &quot;Lookup1&quot;,
&quot;type&quot;: &quot;Lookup&quot;,
&quot;dependsOn&quot;: [],
&quot;policy&quot;: {
&quot;timeout&quot;: &quot;0.12:00:00&quot;,
&quot;retry&quot;: 0,
&quot;retryIntervalInSeconds&quot;: 30,
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;source&quot;: {
&quot;type&quot;: &quot;DelimitedTextSource&quot;,
&quot;storeSettings&quot;: {
&quot;type&quot;: &quot;AzureBlobFSReadSettings&quot;,
&quot;recursive&quot;: true,
&quot;enablePartitionDiscovery&quot;: false
},
&quot;formatSettings&quot;: {
&quot;type&quot;: &quot;DelimitedTextReadSettings&quot;
}
},
&quot;dataset&quot;: {
&quot;referenceName&quot;: &quot;sourcecsv_ADLS&quot;,
&quot;type&quot;: &quot;DatasetReference&quot;
},
&quot;firstRowOnly&quot;: false
}
},
{
&quot;name&quot;: &quot;Get the number of Props&quot;,
&quot;type&quot;: &quot;SetVariable&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;Lookup1&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;policy&quot;: {
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;variableName&quot;: &quot;num_props&quot;,
&quot;value&quot;: {
&quot;value&quot;: &quot;@string(sub(length(split(string(activity(&#39;Lookup1&#39;).output.value),&#39;:&#39;)), 1))&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
},
{
&quot;name&quot;: &quot;ForEach1&quot;,
&quot;type&quot;: &quot;ForEach&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;Get the number of Props&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;items&quot;: {
&quot;value&quot;: &quot;@range(0,int(variables(&#39;num_props&#39;)))&quot;,
&quot;type&quot;: &quot;Expression&quot;
},
&quot;isSequential&quot;: true,
&quot;activities&quot;: [
{
&quot;name&quot;: &quot;get current File name using Prop_n&quot;,
&quot;type&quot;: &quot;SetVariable&quot;,
&quot;dependsOn&quot;: [],
&quot;policy&quot;: {
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;variableName&quot;: &quot;filename1&quot;,
&quot;value&quot;: {
&quot;value&quot;: &quot;@activity(&#39;Lookup1&#39;).output.value[0][concat(&#39;Prop_&#39;,string(item()))]&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
},
{
&quot;name&quot;: &quot;Replace extra spaces and char&quot;,
&quot;type&quot;: &quot;SetVariable&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;get current File name using Prop_n&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;policy&quot;: {
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;variableName&quot;: &quot;filename&quot;,
&quot;value&quot;: {
&quot;value&quot;: &quot;@replace(replace(replace(replace(variables(&#39;filename1&#39;), &#39;[&#39;, &#39;&#39;),&#39;]&#39;,&#39;&#39;),&#39; &#39;,&#39;&#39;),&#39;&#39;&#39;&#39;,&#39;&#39;)&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
}
]
}
}
],
&quot;variables&quot;: {
&quot;num_props&quot;: {
&quot;type&quot;: &quot;String&quot;
},
&quot;filename1&quot;: {
&quot;type&quot;: &quot;String&quot;
},
&quot;filename&quot;: {
&quot;type&quot;: &quot;String&quot;
}
},
&quot;annotations&quot;: []
}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Azure数据工厂：在Python的ForEach循环中使用Lookup结果

问题

答案1

在Pandas的groupby中按多列进行缩减。

在Python中找到树上的所有唯一单一分支。

我正在尝试将Java代码转换为Python代码。

如何在多级索引的数据框中添加 level 0 列并保持格式？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论