string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

2023年6月5日 20:07:49go评论87阅读模式

英文:

string_agg substitute used in Derive transformation in Azure data flow expression language

问题

如何将此SQL代码 "left(string_agg(zationClient, ';;') within group (order by zationClient), 100)" 转换为Azure数据工厂中的数据流表达式语言中的转换？

我期望的输出是这样的，分隔符是 "NAFDQ-Trulife; EAQVN - Trulife"

我明白在数据流表达式语言中不支持string_agg。

我尝试过使用substring(concat(zationClient, ';;'), 1, 128)，但它没有给出所需的结果。

英文:

How do I convert this sql code "left(string_agg(zationClient, '; ') within group (order by zationClient) , 100)" as a Transformation in data factory specifically in azure data flow expression language?

I am expecting an output like this with delimiter "NAFDQ-Trulife; EAQVN - Trulife"

I understand that string_agg is not supported in data flow expression language.

I tried substring(concat(zationClient, '; '), 1, 128) but it doesn't give the required result.

答案1

得分: 0

您需要使用派生列转换、排序转换和聚合转换的组合来获得所需的结果。以下是我拥有的起始数据：

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

我已添加了一个名为tp的新列，其静态值为1。这可以用于按需分组数据。

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

现在，使用排序转换在所需列上对数据进行排序，其配置类似于下面所示的配置：

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

现在，使用聚合转换按tp列进行分组。在聚合选项卡中，使用以下表达式来获取所需的结果：

left(substring(toString(reduce(collect(con), ''', #acc+';'+#item, #result)),2),100)

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

以下是完整的数据流 JSON：

{
    "name": "dataflow1",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    },
                    "name": "source1"
                }
            ],
            "sinks": [
                {
                    "name": "sink1"
                }
            ],
            "transformations": [
                {
                    "name": "derivedColumn1"
                },
                {
                    "name": "sort1"
                },
                {
                    "name": "aggregate1"
                }
            ],
            "scriptLines": [
                "source(output(",
                "          id as short,",
                "          con as string",
                "     ),",
                "     allowSchemaDrift: true,",
                "     validateSchema: false,",
                "     ignoreNoFilesFound: false) ~> source1",
                "source1 derive(tp = 1) ~> derivedColumn1",
                "derivedColumn1 sort(asc(con, true),",
                "     caseInsensitive: true) ~> sort1",
                "sort1 aggregate(groupBy(tp),",
                "     con = left(substring(toString(reduce(collect(con), '&#39;&#39;, #acc+&#39;;&#39;+#item, #result)),2),100)) ~> aggregate1",
                "aggregate1 sink(validateSchema: false,",
                "     skipDuplicateMapInputs: true,",
                "     skipDuplicateMapOutputs: true,",
                "     store: 'cache',",
                "     format: 'inline',",
                "     output: false,",
                "     saveOrder: 1) ~> sink1"
            ]
        }
    }
}

英文:

You need to use a combination of derived column transformation, sort transformation and an aggregate transformation to get the desired result. The following is the starting data that I have:

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

I have added a new column tp with a static value 1. This can be used to group the data as required.

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

Now sort the data on the required column using sort transformation with configurations similar to the ones shown below;

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

Now using the aggregate transformations, to group by the tp column. In the aggregates tab, use the required column with the following expression to get the desired results.

left(substring(toString(reduce(collect(con), &#39;&#39;, #acc+&#39;;&#39;+#item, #result)),2),100)

string_agg函数在Azure数据流表达式语言中用于Derive转换中的替代。

The following is the entire Dataflow JSON:

{
    &quot;name&quot;: &quot;dataflow1&quot;,
    &quot;properties&quot;: {
        &quot;type&quot;: &quot;MappingDataFlow&quot;,
        &quot;typeProperties&quot;: {
            &quot;sources&quot;: [
                {
                    &quot;dataset&quot;: {
                        &quot;referenceName&quot;: &quot;DelimitedText1&quot;,
                        &quot;type&quot;: &quot;DatasetReference&quot;
                    },
                    &quot;name&quot;: &quot;source1&quot;
                }
            ],
            &quot;sinks&quot;: [
                {
                    &quot;name&quot;: &quot;sink1&quot;
                }
            ],
            &quot;transformations&quot;: [
                {
                    &quot;name&quot;: &quot;derivedColumn1&quot;
                },
                {
                    &quot;name&quot;: &quot;sort1&quot;
                },
                {
                    &quot;name&quot;: &quot;aggregate1&quot;
                }
            ],
            &quot;scriptLines&quot;: [
                &quot;source(output(&quot;,
                &quot;          id as short,&quot;,
                &quot;          con as string&quot;,
                &quot;     ),&quot;,
                &quot;     allowSchemaDrift: true,&quot;,
                &quot;     validateSchema: false,&quot;,
                &quot;     ignoreNoFilesFound: false) ~&gt; source1&quot;,
                &quot;source1 derive(tp = 1) ~&gt; derivedColumn1&quot;,
                &quot;derivedColumn1 sort(asc(con, true),&quot;,
                &quot;     caseInsensitive: true) ~&gt; sort1&quot;,
                &quot;sort1 aggregate(groupBy(tp),&quot;,
                &quot;     con = left(substring(toString(reduce(collect(con), &#39;&#39;, #acc+&#39;;&#39;+#item, #result)),2),100)) ~&gt; aggregate1&quot;,
                &quot;aggregate1 sink(validateSchema: false,&quot;,
                &quot;     skipDuplicateMapInputs: true,&quot;,
                &quot;     skipDuplicateMapOutputs: true,&quot;,
                &quot;     store: &#39;cache&#39;,&quot;,
                &quot;     format: &#39;inline&#39;,&quot;,
                &quot;     output: false,&quot;,
                &quot;     saveOrder: 1) ~&gt; sink1&quot;
            ]
        }
    }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

本文由 huangapple 发表于 2023年6月5日 20:07:49
转载请务必保留本文链接：https://go.coder-hub.com/76406264.html

azure-data-factory
azure-synapse-pipeline
expression

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf

go 64 07/18

Azure数据工厂 – 如何使用ForEach循环迭代和读取CSV文件中的记录？

go 107 08/09

如何设计 ADF 管道将容器数据拆分成多个 Cosmos 数据库容器。

go 67 07/13

DotNetZip密码不起作用或不正确。

go 69 05/10

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开