英文:
string_agg substitute used in Derive transformation in Azure data flow expression language
问题
如何将此SQL代码 "left(string_agg(zationClient, ';;') within group (order by zationClient), 100)" 转换为Azure数据工厂中的数据流表达式语言中的转换?
我期望的输出是这样的,分隔符是 "NAFDQ-Trulife; EAQVN - Trulife"
我明白在数据流表达式语言中不支持string_agg。
我尝试过使用substring(concat(zationClient, ';;'), 1, 128),但它没有给出所需的结果。
英文:
How do I convert this sql code "left(string_agg(zationClient, '; ') within group (order by zationClient) , 100)" as a Transformation in data factory specifically in azure data flow expression language?
I am expecting an output like this with delimiter "NAFDQ-Trulife; EAQVN - Trulife"
I understand that string_agg is not supported in data flow expression language.
I tried substring(concat(zationClient, '; '), 1, 128) but it doesn't give the required result.
答案1
得分: 0
- 您需要使用派生列转换、排序转换和聚合转换的组合来获得所需的结果。以下是我拥有的起始数据:
- 我已添加了一个名为
tp
的新列,其静态值为1
。这可以用于按需分组数据。
- 现在,使用排序转换在所需列上对数据进行排序,其配置类似于下面所示的配置:
- 现在,使用聚合转换按
tp
列进行分组。在聚合选项卡中,使用以下表达式来获取所需的结果:
left(substring(toString(reduce(collect(con), ''', #acc+';'+#item, #result)),2),100)
- 以下是完整的数据流 JSON:
{
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "DelimitedText1",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"name": "sink1"
}
],
"transformations": [
{
"name": "derivedColumn1"
},
{
"name": "sort1"
},
{
"name": "aggregate1"
}
],
"scriptLines": [
"source(output(",
" id as short,",
" con as string",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false) ~> source1",
"source1 derive(tp = 1) ~> derivedColumn1",
"derivedColumn1 sort(asc(con, true),",
" caseInsensitive: true) ~> sort1",
"sort1 aggregate(groupBy(tp),",
" con = left(substring(toString(reduce(collect(con), ''', #acc+';'+#item, #result)),2),100)) ~> aggregate1",
"aggregate1 sink(validateSchema: false,",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" store: 'cache',",
" format: 'inline',",
" output: false,",
" saveOrder: 1) ~> sink1"
]
}
}
}
英文:
- You need to use a combination of derived column transformation, sort transformation and an aggregate transformation to get the desired result. The following is the starting data that I have:
- I have added a new column
tp
with a static value1
. This can be used to group the data as required.
- Now sort the data on the required column using sort transformation with configurations similar to the ones shown below;
- Now using the aggregate transformations, to group by the
tp
column. In the aggregates tab, use the required column with the following expression to get the desired results.
left(substring(toString(reduce(collect(con), '', #acc+';'+#item, #result)),2),100)
- The following is the entire Dataflow JSON:
{
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "DelimitedText1",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"name": "sink1"
}
],
"transformations": [
{
"name": "derivedColumn1"
},
{
"name": "sort1"
},
{
"name": "aggregate1"
}
],
"scriptLines": [
"source(output(",
" id as short,",
" con as string",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false) ~> source1",
"source1 derive(tp = 1) ~> derivedColumn1",
"derivedColumn1 sort(asc(con, true),",
" caseInsensitive: true) ~> sort1",
"sort1 aggregate(groupBy(tp),",
" con = left(substring(toString(reduce(collect(con), '', #acc+';'+#item, #result)),2),100)) ~> aggregate1",
"aggregate1 sink(validateSchema: false,",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" store: 'cache',",
" format: 'inline',",
" output: false,",
" saveOrder: 1) ~> sink1"
]
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论