2023年6月6日 12:25:33go评论137阅读模式

英文:

json object is too complex for ADF dataflow to parse/roll

问题

我需要收集卖家的ID和卖家的名称。我尝试在ADF数据流中实现这一目标，但在展开转换中，由于某种原因无法按卖家汇总。似乎这不是一个有效的数组。

我尝试过以下方法：

我尝试了一些其他方法，比如将它复制到blob中，并调整了一些设置，比如将文件模式设置为“对象数组”，然后将其作为数据流的源。这没有起作用。
尝试使用一些聚合转换，但不确定如何应用它。我对数据流是新手。

如果在数据流中完全不可能实现，那么在pyspark中是否可以轻松完成？

英文:

I have a json which looks like below:

I need to collect the seller ID and seller Name for now. I was trying to achieve in adf data flow but it was un able to roll it by sellers for some reason in the flatten transformation . Seems like this not a valid array.

what I tried:

I tried some other ways like copying it into blob and tweaked some
settings like file pattern as 'array of objects' and feed it as a source to dataflow which
didn't work.
Try to use some aggregate transformation but not sure how to apply it. I am new to dataflow.

Can it be done easily in pyspark if its not possible in dataflow at all is also I am open to.

答案1

得分: 1

I have taken the following sample data in order to get the sellerID and sellerName.

Now, using the select transformation with rule-based mapping, I have selected the properties of sellers.

I have used derived column transformation to add a new column called columns with the value as columnNames('select1') which would give me an array column with value as ["ABC","XYZ"] (the keys which we need).

Now, iterate over the columns array as shown in the above image using the items value as @activity('Data flow1').output.runStatus.output.sink1.value[0].columns

Inside the for each, first I have a set variable activity to build each object containing sellerId and sellerName.

Now, I have used append variable activity after this to append this object to an array variable using the dynamic content @json(variables('tp')).

Now you can write this variable's value to a file as JSON.

英文:

I have taken the following sample data in order to get the sellerID and sellerName.

{
&quot;timestamp&quot;:1686036525840,
&quot;tokensLeft&quot;:1,
&quot;refillIn&quot;:1,
&quot;refillRate&quot;:1,
&quot;tokenFlowReduction&quot;:0.0,
&quot;tokensConsumed&quot;:2,
&quot;processingTimeInMs&quot;:0,
&quot;sellers&quot;:{
&quot;ABC&quot;:{
&quot;trackedSince&quot;:1,
&quot;domainId&quot;:1,
&quot;sellerId&quot;:&quot;XX&quot;,
&quot;sellerName&quot;:&quot;LS&quot;,
&quot;csv&quot;:[
[
6062020
],
[
6062020
]
],
&quot;lastUpdate&quot;:6536604,
&quot;isScammer&quot;:false,
&quot;hasFBA&quot;:true,
&quot;totalStorefrontAsins&quot;:[
1,
44
],
&quot;sellerCategoryStatistics&quot;:[
{
&quot;catId&quot;:1,
&quot;productCount&quot;:2,
&quot;avg30SalesRank&quot;:3,
&quot;productCountWithAmazonOffer&quot;:5
},
{
&quot;catId&quot;:3,
&quot;productCount&quot;:11,
&quot;avg30SalesRank&quot;:72203,
&quot;productCountWithAmazonOffer&quot;:3
}
],
&quot;sellerBrandStatistics&quot;:[
{
&quot;brand&quot;:&quot;1&quot;,
&quot;productCount&quot;:3,
&quot;avg30SalesRank&quot;:18820,
&quot;productCountWithAmazonOffer&quot;:0
},
{
&quot;brand&quot;:&quot;l3&quot;,
&quot;productCount&quot;:3,
&quot;avg30SalesRank&quot;:32525,
&quot;productCountWithAmazonOffer&quot;:1
},
{
&quot;brand&quot;:&quot;1&quot;,
&quot;productCount&quot;:3,
&quot;avg30SalesRank&quot;:40102,
&quot;productCountWithAmazonOffer&quot;:1
},
{
&quot;brand&quot;:&quot;1&quot;,
&quot;productCount&quot;:1,
&quot;avg30SalesRank&quot;:5315,
&quot;productCountWithAmazonOffer&quot;:0
}
],
&quot;shipsFromChina&quot;:false,
&quot;address&quot;:[
&quot;1&quot;
],
&quot;recentFeedback&quot;:[
{
&quot;rating&quot;:10,
&quot;date&quot;:6531840,
&quot;feedback&quot;:&quot;Never .  Black &quot;,
&quot;isStriked&quot;:true
},
{
&quot;rating&quot;:50,
&quot;date&quot;:6523200,
&quot;feedback&quot;:&quot;I received.&quot;,
&quot;isStriked&quot;:false
},
{
&quot;rating&quot;:50,
&quot;date&quot;:6521760,
&quot;feedback&quot;:&quot;Fast ship a&quot;,
&quot;isStriked&quot;:false
},
{
&quot;rating&quot;:50,
&quot;date&quot;:6518880,
&quot;feedback&quot;:&quot;It  to .&quot;,
&quot;isStriked&quot;:false
},
{
&quot;rating&quot;:20,
&quot;date&quot;:6517440,
&quot;feedback&quot;:&quot;I r .&quot;,
&quot;isStriked&quot;:true
}
],
&quot;lastRatingUpdate&quot;:6536604,
&quot;neutralRating&quot;:[
0,
0,
0,
0
],
&quot;negativeRating&quot;:[
0,
0,
2,
2
],
&quot;positiveRating&quot;:[
100,
100,
98,
98
],
&quot;ratingCount&quot;:[
7,
23,
59,
59
],
&quot;currentRating&quot;:1,
&quot;currentRatingCount&quot;:1,
&quot;ratingsLast30Days&quot;:6
},
&quot;XYZ&quot;:{
&quot;trackedSince&quot;:2795761,
&quot;domainId&quot;:1,
&quot;sellerId&quot;:&quot;CVB&quot;,
&quot;sellerName&quot;:&quot;ABC&quot;,
&quot;csv&quot;:[
[
271
],
[
6,
46101
]
],
&quot;lastUpdate&quot;:6536586,
&quot;isScammer&quot;:false,
&quot;hasFBA&quot;:true,
&quot;totalStorefrontAsins&quot;:[
1,
1
],
&quot;sellerCategoryStatistics&quot;:[
{
&quot;catId&quot;:1,
&quot;productCount&quot;:131,
&quot;avg30SalesRank&quot;:11,
&quot;productCountWithAmazonOffer&quot;:1
},
{
&quot;catId&quot;:3760911,
&quot;productCount&quot;:106,
&quot;avg30SalesRank&quot;:101900,
&quot;productCountWithAmazonOffer&quot;:8
},
{
&quot;catId&quot;:1055398,
&quot;productCount&quot;:93,
&quot;avg30SalesRank&quot;:107441,
&quot;productCountWithAmazonOffer&quot;:41
},
{
&quot;catId&quot;:3760901,
&quot;productCount&quot;:60,
&quot;avg30SalesRank&quot;:11,
&quot;productCountWithAmazonOffer&quot;:3
},
{
&quot;catId&quot;:11,
&quot;productCount&quot;:11,
&quot;avg30SalesRank&quot;:11,
&quot;productCountWithAmazonOffer&quot;:5
},
{
&quot;catId&quot;:11,
&quot;productCount&quot;:28,
&quot;avg30SalesRank&quot;:489,
&quot;productCountWithAmazonOffer&quot;:23
},
{
&quot;catId&quot;:1,
&quot;productCount&quot;:1,
&quot;avg30SalesRank&quot;:1,
&quot;productCountWithAmazonOffer&quot;:0
},
{
&quot;catId&quot;:1,
&quot;productCount&quot;:12,
&quot;avg30SalesRank&quot;:1,
&quot;productCountWithAmazonOffer&quot;:0
},
{
&quot;catId&quot;:1,
&quot;productCount&quot;:8,
&quot;avg30SalesRank&quot;:1,
&quot;productCountWithAmazonOffer&quot;:4
},
{
&quot;catId&quot;:1,
&quot;productCount&quot;:6,
&quot;avg30SalesRank&quot;:1,
&quot;productCountWithAmazonOffer&quot;:3
}
],
&quot;sellerBrandStatistics&quot;:[
{
&quot;brand&quot;:&quot;1e&quot;,
&quot;productCount&quot;:35,
&quot;avg30SalesRank&quot;:1,
&quot;productCountWithAmazonOffer&quot;:0
}
]
}
}
}

Now, using the select transformation with rule-based mapping, I have selected the properties of sellers.

json object is too complex for ADF dataflow to parse/roll

I have used derived column transformation to add a new column called columns with the value as columnNames('select1') which would give me an array column with value as ["ABC","XYZ"] (the keys which we need).

json object is too complex for ADF dataflow to parse/roll

I have using sink cache and writing to the activity output. The following is the complete Dataflow JSON:

{
&quot;name&quot;: &quot;dataflow1&quot;,
&quot;properties&quot;: {
&quot;type&quot;: &quot;MappingDataFlow&quot;,
&quot;typeProperties&quot;: {
&quot;sources&quot;: [
{
&quot;dataset&quot;: {
&quot;referenceName&quot;: &quot;Json1&quot;,
&quot;type&quot;: &quot;DatasetReference&quot;
},
&quot;name&quot;: &quot;source1&quot;
}
],
&quot;sinks&quot;: [
{
&quot;name&quot;: &quot;sink1&quot;
}
],
&quot;transformations&quot;: [
{
&quot;name&quot;: &quot;select1&quot;
},
{
&quot;name&quot;: &quot;derivedColumn1&quot;
}
],
&quot;scriptLines&quot;: [
&quot;source(output(&quot;,
&quot;          timestamp as integer,&quot;,
&quot;          tokensLeft as integer,&quot;,
&quot;          refillIn as integer,&quot;,
&quot;          refillRate as integer,&quot;,
&quot;          tokenFlowReduction as double,&quot;,
&quot;          tokensConsumed as integer,&quot;,
&quot;          processingTimeInMs as integer,&quot;,
&quot;          sellers as (ABC as (trackedSince as integer, domainId as integer, sellerId as string, sellerName as string, csv as integer[][], lastUpdate as integer, isScammer as boolean, hasFBA as boolean, totalStorefrontAsins as integer[], sellerCategoryStatistics as (catId as integer, productCount as integer, avg30SalesRank as integer, productCountWithAmazonOffer as integer)[], sellerBrandStatistics as (brand as string, productCount as integer, avg30SalesRank as integer, productCountWithAmazonOffer as integer)[], shipsFromChina as boolean, address as string[], recentFeedback as (rating as integer, date as integer, feedback as string, isStriked as boolean)[], lastRatingUpdate as integer, neutralRating as integer[], negativeRating as integer[], positiveRating as integer[], ratingCount as integer[], currentRating as integer, currentRatingCount as integer, ratingsLast30Days as integer), XYZ as (trackedSince as integer, domainId as integer, sellerId as string, sellerName as string, csv as integer[][], lastUpdate as integer, isScammer as boolean, hasFBA as boolean, totalStorefrontAsins as integer[], sellerCategoryStatistics as (catId as integer, productCount as integer, avg30SalesRank as integer, productCountWithAmazonOffer as integer)[], sellerBrandStatistics as (brand as string, productCount as integer, avg30SalesRank as integer, productCountWithAmazonOffer as integer)[]))&quot;,
&quot;     ),&quot;,
&quot;     allowSchemaDrift: true,&quot;,
&quot;     validateSchema: false,&quot;,
&quot;     ignoreNoFilesFound: false,&quot;,
&quot;     documentForm: &#39;singleDocument&#39;) ~&gt; source1&quot;,
&quot;source1 select(mapColumn(&quot;,
&quot;          each(sellers,match(true()))&quot;,
&quot;     ),&quot;,
&quot;     skipDuplicateMapInputs: true,&quot;,
&quot;     skipDuplicateMapOutputs: true) ~&gt; select1&quot;,
&quot;select1 derive(columns = columnNames(&#39;select1&#39;)) ~&gt; derivedColumn1&quot;,
&quot;derivedColumn1 sink(validateSchema: false,&quot;,
&quot;     skipDuplicateMapInputs: true,&quot;,
&quot;     skipDuplicateMapOutputs: true,&quot;,
&quot;     store: &#39;cache&#39;,&quot;,
&quot;     format: &#39;inline&#39;,&quot;,
&quot;     output: true,&quot;,
&quot;     saveOrder: 1) ~&gt; sink1&quot;
]
}
}
}

json object is too complex for ADF dataflow to parse/roll

Now, iterate over the columns array as shown in the above image using the items value as @activity('Data flow1').output.runStatus.output.sink1.value[0].columns
Inside the for each, first I have a set variable activity to build each object containing sellerId and sellerName with following dynamic content:

{
&quot;sellerId&quot;:&quot;@{activity(&#39;Data flow1&#39;).output.runStatus.output.sink1.value[0][item()].sellerId}&quot;,
&quot;sellerName&quot;:&quot;@{activity(&#39;Data flow1&#39;).output.runStatus.output.sink1.value[0][item()].sellerName}&quot;
}

json object is too complex for ADF dataflow to parse/roll

Now, I have used append variable activity after this to append this object to an array variable using the dynamic content @json(variables('tp')).

json object is too complex for ADF dataflow to parse/roll

I have used another set variable activity for demonstration purposes to show the value of this array that was just built.

json object is too complex for ADF dataflow to parse/roll

Now you can write this variable's value to a file as JSON. Refer to the solution provided in this question. The following is the Pipeline Json:

{
&quot;name&quot;: &quot;pipeline1&quot;,
&quot;properties&quot;: {
&quot;activities&quot;: [
{
&quot;name&quot;: &quot;Data flow1&quot;,
&quot;type&quot;: &quot;ExecuteDataFlow&quot;,
&quot;dependsOn&quot;: [],
&quot;policy&quot;: {
&quot;timeout&quot;: &quot;0.12:00:00&quot;,
&quot;retry&quot;: 0,
&quot;retryIntervalInSeconds&quot;: 30,
&quot;secureOutput&quot;: false,
&quot;secureInput&quot;: false
},
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;dataflow&quot;: {
&quot;referenceName&quot;: &quot;dataflow1&quot;,
&quot;type&quot;: &quot;DataFlowReference&quot;
},
&quot;compute&quot;: {
&quot;coreCount&quot;: 8,
&quot;computeType&quot;: &quot;General&quot;
},
&quot;traceLevel&quot;: &quot;None&quot;,
&quot;cacheSinks&quot;: {
&quot;firstRowOnly&quot;: false
}
}
},
{
&quot;name&quot;: &quot;ForEach1&quot;,
&quot;type&quot;: &quot;ForEach&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;Data flow1&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;items&quot;: {
&quot;value&quot;: &quot;@activity(&#39;Data flow1&#39;).output.runStatus.output.sink1.value[0].columns&quot;,
&quot;type&quot;: &quot;Expression&quot;
},
&quot;isSequential&quot;: true,
&quot;activities&quot;: [
{
&quot;name&quot;: &quot;Append variable1&quot;,
&quot;type&quot;: &quot;AppendVariable&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;Set variable1&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;variableName&quot;: &quot;req&quot;,
&quot;value&quot;: {
&quot;value&quot;: &quot;@json(variables(&#39;tp&#39;))&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
},
{
&quot;name&quot;: &quot;Set variable1&quot;,
&quot;type&quot;: &quot;SetVariable&quot;,
&quot;dependsOn&quot;: [],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;variableName&quot;: &quot;tp&quot;,
&quot;value&quot;: {
&quot;value&quot;: &quot;{\n    \&quot;sellerId\&quot;:\&quot;@{activity(&#39;Data flow1&#39;).output.runStatus.output.sink1.value[0][item()].sellerId}\&quot;,\n    \&quot;sellerName\&quot;:\&quot;@{activity(&#39;Data flow1&#39;).output.runStatus.output.sink1.value[0][item()].sellerName}\&quot;\n}&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
}
]
}
},
{
&quot;name&quot;: &quot;Set variable2&quot;,
&quot;type&quot;: &quot;SetVariable&quot;,
&quot;dependsOn&quot;: [
{
&quot;activity&quot;: &quot;ForEach1&quot;,
&quot;dependencyConditions&quot;: [
&quot;Succeeded&quot;
]
}
],
&quot;userProperties&quot;: [],
&quot;typeProperties&quot;: {
&quot;variableName&quot;: &quot;test&quot;,
&quot;value&quot;: {
&quot;value&quot;: &quot;@variables(&#39;req&#39;)&quot;,
&quot;type&quot;: &quot;Expression&quot;
}
}
}
],
&quot;variables&quot;: {
&quot;tp&quot;: {
&quot;type&quot;: &quot;String&quot;
},
&quot;req&quot;: {
&quot;type&quot;: &quot;Array&quot;
},
&quot;test&quot;: {
&quot;type&quot;: &quot;Array&quot;
}
},
&quot;annotations&quot;: []
}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

json object is too complex for ADF dataflow to parse/roll

问题

答案1

将制表符分隔的字符串拆分成不同的列。

MS Teams连接器（Incoming WebHook）URL的过期/更新频率是多久？

Synapse中列名中的空格

资源类型 ‘/’ 不支持诊断设置

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论