英文:
Snowflake Flatten and parsing values
问题
倒序后的翻译如下:
]] B12345 ,B12345 , ,Lname1, Name1 , [https://url.url.com/rest/api/2/user?username=B12345 ,B12345 , , ,Lname1, Name1 ,[https://url.url.com/secure/?ownerId=B12345&id=123 ,https://url.url.com/secure/?size=medium&ownerId=B12345&id=123 ,https://url.url.com/secure/?size=small&ownerId=B12345&id=123 ,https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123 ,false,]] ,C12345 ,C12345 , ,Lname3. ,Name1 , [India/Mumbai ,https://url.url.com/rest/api/2/user?username=B12345 ,C12345 , , ,Lname3. ,Name1 ,[https://url.url.com/secure/?ownerId=C12345&id=456 ,https://url.url.com/secure/?size=medium&ownerId=C12345&id=456 ,https://url.url.com/secure/?size=small&ownerId=C12345&id=456 ,https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456 ,true,]] ,A12345 ,A12345 ,name.lname@abcdef.com ,Lname2. ,Name1 , [India/Mumbai ,https://url.url.com/rest/api/2/user?username=B12345 ,A12345 ,name.lname@abcdef.com , ,Lname2. ,Name1 ,[https://url.url.com/secure/?ownerId=A12345&id=456 ,https://url.url.com/secure/?size=medium&ownerId=A12345&id=456 ,https://url.url.com/secure/?size=small&ownerId=A12345&id=456 ,https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456 ,true,]]]
这是你提供的内容的倒序翻译。如果需要进一步的帮助,请随时提出。
英文:
I have a snowflake table that has VARCHAR column containing input from api. I have 300+ columns to be flattened and one of them has input like below for a particular row. I need to parse the values from the below (please refer output) and store it as a single row for a particular input row. The number of elements inside the input list might be 1 or more than 1 but the order of the keys present remain same.
Input:
[
{"active":false,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123","4x4":"https://url.url.com/secure/?size=small&ownerId=B12345&id=123","22x22":"https://url.url.com/secure/?size=medium&ownerId=B12345&id=123","44x44":"https://url.url.com/secure/?ownerId=B12345&id=123"},"displayName":"Name1,Lname1","emailAddress":"","key":"B12345","name":"B12345","self":"https://url.url.com/rest/api/2/user?username=B12345","timeZone":"India/Mumbai"},
{"active":true,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456","4x4":"https://url.url.com/secure/?size=small&ownerId=A12345&id=456","22x22":"https://url.url.com/secure/?size=medium&ownerId=A12345&id=456","44x44":"https://url.url.com/secure/?ownerId=A12345&id=456"},"displayName":"Name1,Lname2.","emailAddress":"name.lname@abcdef.com","key":"A12345","name":"A12345","self":"https://url.url.com/rest/api/2/user?username=A12345","timeZone":"India/Mumbai"}
{"active":true,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456","4x4":"https://url.url.com/secure/?size=small&ownerId=C12345&id=456","22x22":"https://url.url.com/secure/?size=medium&ownerId=C12345&id=456","44x44":"https://url.url.com/secure/?ownerId=C12345&id=456"},"displayName":"Name1,Lname3.","emailAddress":"name.lname@abcdef.com","key":"C12345","name":"C12345","self":"https://url.url.com/rest/api/2/user?username=C12345","timeZone":"India/Mumbai"}
]
Output I am looking for is: list of values of the keys (nested)
[
[false,
[https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123,
https://url.url.com/secure/?size=small&ownerId=B12345&id=123,
https://url.url.com/secure/?size=medium&ownerId=B12345&id=123,
https://url.url.com/secure/?ownerId=B12345&id=123], Name1, Lname1, ,
B12345, B12345, https://url.url.com/rest/api/2/user?username=B12345,
India/Mumbai],
[true,
[https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456,
https://url.url.com/secure/?size=small&ownerId=A12345&id=456,
https://url.url.com/secure/?size=medium&ownerId=A12345&id=456,
https://url.url.com/secure/?ownerId=A12345&id=456], Name1, Lname2.,
name.lname@abcdef.com, A12345,
A12345,https://url.url.com/rest/api/2/user?username=B12345,
India/Mumbai] ,
[true,
[https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456,
https://url.url.com/secure/?size=small&ownerId=C12345&id=456,
https://url.url.com/secure/?size=medium&ownerId=C12345&id=456,
https://url.url.com/secure/?ownerId=C12345&id=456], Name1, Lname3.,
name.lname@abcdef.com, C12345,
C12345,https://url.url.com/rest/api/2/user?username=B12345,
India/Mumbai]
]
Query that I tried:
SELECT
CONCAT('[[',
REPLACE(GET(flattened.value, 'active'), '"', '') , ', ',
'[',
listagg(CASE WHEN flattened_nested.value LIKE '%https%' THEN 'https:' || REPLACE(SPLIT_PART(flattened_nested.value, ':', 2), '"', '') ELSE NULL END,',')within group (order by null)
,']', ', ', --<== this causes issue while trying to parse urls values for each elements and to store them.
REPLACE(GET(flattened.value, 'displayName'), '"', '') , ', ',
REPLACE(GET(flattened.value, 'emailAddress'), '"', '') , ', ',
REPLACE(GET(flattened.value, 'key'), '"', '') , ', ',
REPLACE(GET(flattened.value, 'name'), '"', '') , ', ',
REPLACE(GET(flattened.value, 'self'), '"', '') , ', ',
REPLACE(GET(flattened.value, 'timeZone'), '"', ''),
']]') as output_column
FROM snowflake_table SRC
,LATERAL FLATTEN(input=>SRC.json_values:"fields":"field_12345") AS flattened --<== the field that contains the above input.
,LATERAL FLATTEN(input=>flattened.value:urls) AS flattened_nested
group by flattened.value
;
-
The below output that I get is aggregating all the 3 urls value for a particular input and stores it as comma separated. but, it doesn't take the other values like displayname, key,name, self etc for all 3 elements. It gives me only the first occurrence.
-
The other issue with this approach is if I need to include other 300+ columns, I have place everything in group by clause.
> [ [false,
> [https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123,
> https://url.url.com/secure/?size=small&ownerId=B12345&id=123,
> https://url.url.com/secure/?size=medium&ownerId=B12345&id=123,
> https://url.url.com/secure/?ownerId=B12345&id=123,https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456,
> https://url.url.com/secure/?size=small&ownerId=A12345&id=456,
> https://url.url.com/secure/?size=medium&ownerId=A12345&id=456,
> https://url.url.com/secure/?ownerId=A12345&id=456,https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456,
> https://url.url.com/secure/?size=small&ownerId=C12345&id=456,
> https://url.url.com/secure/?size=medium&ownerId=C12345&id=456,
> https://url.url.com/secure/?ownerId=C12345&id=456], Name1, Lname1, ,
> B12345, B12345, https://url.url.com/rest/api/2/user?username=B12345,
> India/Mumbai] ]
Can anyone please let me know how to get the desired output with any different approach irrespective of the number of input elements inside the list?
答案1
得分: 0
这不会生成嵌套的JSON,但我相信它应该能提取你想要的值,然后你可以从中形成JSON:
SELECT
t.value:active::BOOLEAN AS active
, ARRAY_AGG(t.value:urls) WITHIN GROUP (ORDER BY seq) AS urls
, SPLIT_PART(t.value:displayName::STRING, ',', 1) AS firstName
, SPLIT_PART(t.value:displayName::STRING, ',', 2) AS lastName
, t.value:emailAddress::STRING AS emailAddress
, t.value:key::STRING AS key
, t.value:name::STRING AS name
, t.value:self::STRING AS self
, t.value:timeZone::STRING AS timeZone
FROM snowflake_table
, LATERAL FLATTEN(input => json_values) t
GROUP BY active, firstName, lastName, emailAddress, key, name, self, timeZone;
另一种使用JSON_PARSE的语法也是可能的:
SELECT
json_values:active::BOOLEAN AS active
, ARRAY_AGG(json_values:urls) AS urls
, json_values:displayName::STRING AS displayName
, json_values:emailAddress::STRING AS emailAddress
, json_values:key::STRING AS key
, json_values:name::STRING AS name
, json_values:self::STRING AS self
, json_values:timeZone::STRING AS timeZone
FROM snowflake_table
, LATERAL JSON_PARSE(json_values) AS json_values
GROUP BY active, displayName, emailAddress, key, name, self, timeZone;
你可能需要考虑两者,以便更容易应用最后一步并获得相对性能。使用JSON_VALUES更容易理解,但差异非常微小。
英文:
This won't produce the nested JSON, but it should I believe pick-out the values you want, and from there you should be able to form the JSON from it:
SELECT
t.value:active::BOOLEAN AS active
, ARRAY_AGG(t.value:urls) WITHIN GROUP (ORDER BY seq) AS urls
, SPLIT_PART(t.value:displayName::STRING, ',', 1) AS firstName
, SPLIT_PART(t.value:displayName::STRING, ',', 2) AS lastName
, t.value:emailAddress::STRING AS emailAddress
, t.value:key::STRING AS key
, t.value:name::STRING AS name
, t.value:self::STRING AS self
, t.value:timeZone::STRING AS timeZone
FROM snowflake_table
, LATERAL FLATTEN(input => json_values) t
GROUP BY active, firstName, lastName, emailAddress, key, name, self, timeZone;
An alternative syntax via JSON_PARSE is also possible I feel:
SELECT
json_values:active::BOOLEAN AS active
, ARRAY_AGG(json_values:urls) AS urls
, json_values:displayName::STRING AS displayName
, json_values:emailAddress::STRING AS emailAddress
, json_values:key::STRING AS key
, json_values:name::STRING AS name
, json_values:self::STRING AS self
, json_values:timeZone::STRING AS timeZone
FROM snowflake_table
, LATERAL JSON_PARSE(json_values) AS json_values
GROUP BY active, displayName, emailAddress, key, name, self, timeZone;
You may want to consider both for ease of application of the final step and relative performance. It does feel like using JSON_VALUES is easier to understand but the difference in that is very a=marginal.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论