雪花平铺和解析数值

huangapple go评论60阅读模式
英文:

Snowflake Flatten and parsing values

问题

倒序后的翻译如下:

]] B12345 ,B12345 , ,Lname1, Name1 , [https://url.url.com/rest/api/2/user?username=B12345 ,B12345 , , ,Lname1, Name1 ,[https://url.url.com/secure/?ownerId=B12345&id=123 ,https://url.url.com/secure/?size=medium&ownerId=B12345&id=123 ,https://url.url.com/secure/?size=small&ownerId=B12345&id=123 ,https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123 ,false,]] ,C12345 ,C12345 , ,Lname3. ,Name1 , [India/Mumbai ,https://url.url.com/rest/api/2/user?username=B12345 ,C12345 , , ,Lname3. ,Name1 ,[https://url.url.com/secure/?ownerId=C12345&id=456 ,https://url.url.com/secure/?size=medium&ownerId=C12345&id=456 ,https://url.url.com/secure/?size=small&ownerId=C12345&id=456 ,https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456 ,true,]] ,A12345 ,A12345 ,name.lname@abcdef.com ,Lname2. ,Name1 , [India/Mumbai ,https://url.url.com/rest/api/2/user?username=B12345 ,A12345 ,name.lname@abcdef.com , ,Lname2. ,Name1 ,[https://url.url.com/secure/?ownerId=A12345&id=456 ,https://url.url.com/secure/?size=medium&ownerId=A12345&id=456 ,https://url.url.com/secure/?size=small&ownerId=A12345&id=456 ,https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456 ,true,]]]

这是你提供的内容的倒序翻译。如果需要进一步的帮助,请随时提出。

英文:

I have a snowflake table that has VARCHAR column containing input from api. I have 300+ columns to be flattened and one of them has input like below for a particular row. I need to parse the values from the below (please refer output) and store it as a single row for a particular input row. The number of elements inside the input list might be 1 or more than 1 but the order of the keys present remain same.

Input:

[
{"active":false,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123","4x4":"https://url.url.com/secure/?size=small&ownerId=B12345&id=123","22x22":"https://url.url.com/secure/?size=medium&ownerId=B12345&id=123","44x44":"https://url.url.com/secure/?ownerId=B12345&id=123"},"displayName":"Name1,Lname1","emailAddress":"","key":"B12345","name":"B12345","self":"https://url.url.com/rest/api/2/user?username=B12345","timeZone":"India/Mumbai"},


{"active":true,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456","4x4":"https://url.url.com/secure/?size=small&ownerId=A12345&id=456","22x22":"https://url.url.com/secure/?size=medium&ownerId=A12345&id=456","44x44":"https://url.url.com/secure/?ownerId=A12345&id=456"},"displayName":"Name1,Lname2.","emailAddress":"name.lname@abcdef.com","key":"A12345","name":"A12345","self":"https://url.url.com/rest/api/2/user?username=A12345","timeZone":"India/Mumbai"}

 
{"active":true,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456","4x4":"https://url.url.com/secure/?size=small&ownerId=C12345&id=456","22x22":"https://url.url.com/secure/?size=medium&ownerId=C12345&id=456","44x44":"https://url.url.com/secure/?ownerId=C12345&id=456"},"displayName":"Name1,Lname3.","emailAddress":"name.lname@abcdef.com","key":"C12345","name":"C12345","self":"https://url.url.com/rest/api/2/user?username=C12345","timeZone":"India/Mumbai"}
 ]

Output I am looking for is: list of values of the keys (nested)

[ 
[false,
[https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123,
https://url.url.com/secure/?size=small&ownerId=B12345&id=123,
https://url.url.com/secure/?size=medium&ownerId=B12345&id=123,
https://url.url.com/secure/?ownerId=B12345&id=123], Name1, Lname1, ,
B12345, B12345, https://url.url.com/rest/api/2/user?username=B12345,
India/Mumbai],  

[true,
[https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456,
https://url.url.com/secure/?size=small&ownerId=A12345&id=456,
https://url.url.com/secure/?size=medium&ownerId=A12345&id=456,
https://url.url.com/secure/?ownerId=A12345&id=456], Name1, Lname2.,
name.lname@abcdef.com, A12345,
A12345,https://url.url.com/rest/api/2/user?username=B12345,
India/Mumbai] ,

[true,
[https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456,
https://url.url.com/secure/?size=small&ownerId=C12345&id=456,
https://url.url.com/secure/?size=medium&ownerId=C12345&id=456,
https://url.url.com/secure/?ownerId=C12345&id=456], Name1, Lname3.,
name.lname@abcdef.com, C12345,
C12345,https://url.url.com/rest/api/2/user?username=B12345,
India/Mumbai] 
]

Query that I tried:

SELECT  
CONCAT('[[',
REPLACE(GET(flattened.value,  'active'), '"', '') , ', ',
'[',
listagg(CASE WHEN flattened_nested.value LIKE '%https%' THEN 'https:' || REPLACE(SPLIT_PART(flattened_nested.value, ':', 2), '"', '') ELSE NULL END,',')within group (order by null)

,']', ', ',  --<== this causes issue while trying to parse urls values for each elements and to store them.
REPLACE(GET(flattened.value,  'displayName'), '"', '') , ', ',
REPLACE(GET(flattened.value,  'emailAddress'), '"', '') , ', ',
REPLACE(GET(flattened.value,  'key'), '"', '') , ', ',
REPLACE(GET(flattened.value,  'name'), '"', '')  , ', ',
REPLACE(GET(flattened.value,  'self'), '"', '') , ', ',
REPLACE(GET(flattened.value,  'timeZone'), '"', ''),   

']]')  as output_column
FROM snowflake_table  SRC

,LATERAL FLATTEN(input=>SRC.json_values:"fields":"field_12345") AS flattened --<== the field that contains the above input.
,LATERAL FLATTEN(input=>flattened.value:urls) AS flattened_nested
group by  flattened.value
;
  • The below output that I get is aggregating all the 3 urls value for a particular input and stores it as comma separated. but, it doesn't take the other values like displayname, key,name, self etc for all 3 elements. It gives me only the first occurrence.

  • The other issue with this approach is if I need to include other 300+ columns, I have place everything in group by clause.

> [ [false,
> [https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123,
> https://url.url.com/secure/?size=small&ownerId=B12345&id=123,
> https://url.url.com/secure/?size=medium&ownerId=B12345&id=123,
> https://url.url.com/secure/?ownerId=B12345&id=123,https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456,
> https://url.url.com/secure/?size=small&ownerId=A12345&id=456,
> https://url.url.com/secure/?size=medium&ownerId=A12345&id=456,
> https://url.url.com/secure/?ownerId=A12345&id=456,https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456,
> https://url.url.com/secure/?size=small&ownerId=C12345&id=456,
> https://url.url.com/secure/?size=medium&ownerId=C12345&id=456,
> https://url.url.com/secure/?ownerId=C12345&id=456], Name1, Lname1, ,
> B12345, B12345, https://url.url.com/rest/api/2/user?username=B12345,
> India/Mumbai] ]

Can anyone please let me know how to get the desired output with any different approach irrespective of the number of input elements inside the list?

答案1

得分: 0

这不会生成嵌套的JSON,但我相信它应该能提取你想要的值,然后你可以从中形成JSON:

SELECT 
      t.value:active::BOOLEAN AS active
    , ARRAY_AGG(t.value:urls) WITHIN GROUP (ORDER BY seq) AS urls
    , SPLIT_PART(t.value:displayName::STRING, ',', 1) AS firstName
    , SPLIT_PART(t.value:displayName::STRING, ',', 2) AS lastName
    , t.value:emailAddress::STRING AS emailAddress
    , t.value:key::STRING AS key
    , t.value:name::STRING AS name
    , t.value:self::STRING AS self
    , t.value:timeZone::STRING AS timeZone
FROM snowflake_table
    , LATERAL FLATTEN(input => json_values) t
GROUP BY active, firstName, lastName, emailAddress, key, name, self, timeZone;

另一种使用JSON_PARSE的语法也是可能的:

SELECT 
      json_values:active::BOOLEAN AS active
    , ARRAY_AGG(json_values:urls) AS urls
    , json_values:displayName::STRING AS displayName
    , json_values:emailAddress::STRING AS emailAddress
    , json_values:key::STRING AS key
    , json_values:name::STRING AS name
    , json_values:self::STRING AS self
    , json_values:timeZone::STRING AS timeZone
FROM snowflake_table
    , LATERAL JSON_PARSE(json_values) AS json_values
GROUP BY active, displayName, emailAddress, key, name, self, timeZone;

你可能需要考虑两者,以便更容易应用最后一步并获得相对性能。使用JSON_VALUES更容易理解,但差异非常微小。

英文:

This won't produce the nested JSON, but it should I believe pick-out the values you want, and from there you should be able to form the JSON from it:

SELECT 
      t.value:active::BOOLEAN AS active
    , ARRAY_AGG(t.value:urls) WITHIN GROUP (ORDER BY seq) AS urls
    , SPLIT_PART(t.value:displayName::STRING, ',', 1) AS firstName
    , SPLIT_PART(t.value:displayName::STRING, ',', 2) AS lastName
    , t.value:emailAddress::STRING AS emailAddress
    , t.value:key::STRING AS key
    , t.value:name::STRING AS name
    , t.value:self::STRING AS self
    , t.value:timeZone::STRING AS timeZone
FROM snowflake_table
    , LATERAL FLATTEN(input => json_values) t
GROUP BY active, firstName, lastName, emailAddress, key, name, self, timeZone;

An alternative syntax via JSON_PARSE is also possible I feel:

SELECT 
      json_values:active::BOOLEAN AS active
    , ARRAY_AGG(json_values:urls) AS urls
    , json_values:displayName::STRING AS displayName
    , json_values:emailAddress::STRING AS emailAddress
    , json_values:key::STRING AS key
    , json_values:name::STRING AS name
    , json_values:self::STRING AS self
    , json_values:timeZone::STRING AS timeZone
FROM snowflake_table
    , LATERAL JSON_PARSE(json_values) AS json_values
GROUP BY active, displayName, emailAddress, key, name, self, timeZone;

You may want to consider both for ease of application of the final step and relative performance. It does feel like using JSON_VALUES is easier to understand but the difference in that is very a=marginal.

huangapple
  • 本文由 发表于 2023年6月2日 14:57:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定