雪花平铺和解析数值

huangapple go评论91阅读模式
英文:

Snowflake Flatten and parsing values

问题

倒序后的翻译如下:

  1. ]] B12345 ,B12345 , ,Lname1, Name1 , [https://url.url.com/rest/api/2/user?username=B12345 ,B12345 , , ,Lname1, Name1 ,[https://url.url.com/secure/?ownerId=B12345&id=123 ,https://url.url.com/secure/?size=medium&ownerId=B12345&id=123 ,https://url.url.com/secure/?size=small&ownerId=B12345&id=123 ,https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123 ,false,]] ,C12345 ,C12345 , ,Lname3. ,Name1 , [India/Mumbai ,https://url.url.com/rest/api/2/user?username=B12345 ,C12345 , , ,Lname3. ,Name1 ,[https://url.url.com/secure/?ownerId=C12345&id=456 ,https://url.url.com/secure/?size=medium&ownerId=C12345&id=456 ,https://url.url.com/secure/?size=small&ownerId=C12345&id=456 ,https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456 ,true,]] ,A12345 ,A12345 ,name.lname@abcdef.com ,Lname2. ,Name1 , [India/Mumbai ,https://url.url.com/rest/api/2/user?username=B12345 ,A12345 ,name.lname@abcdef.com , ,Lname2. ,Name1 ,[https://url.url.com/secure/?ownerId=A12345&id=456 ,https://url.url.com/secure/?size=medium&ownerId=A12345&id=456 ,https://url.url.com/secure/?size=small&ownerId=A12345&id=456 ,https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456 ,true,]]]

这是你提供的内容的倒序翻译。如果需要进一步的帮助,请随时提出。

英文:

I have a snowflake table that has VARCHAR column containing input from api. I have 300+ columns to be flattened and one of them has input like below for a particular row. I need to parse the values from the below (please refer output) and store it as a single row for a particular input row. The number of elements inside the input list might be 1 or more than 1 but the order of the keys present remain same.

Input:

  1. [
  2. {"active":false,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123","4x4":"https://url.url.com/secure/?size=small&ownerId=B12345&id=123","22x22":"https://url.url.com/secure/?size=medium&ownerId=B12345&id=123","44x44":"https://url.url.com/secure/?ownerId=B12345&id=123"},"displayName":"Name1,Lname1","emailAddress":"","key":"B12345","name":"B12345","self":"https://url.url.com/rest/api/2/user?username=B12345","timeZone":"India/Mumbai"},
  3. {"active":true,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456","4x4":"https://url.url.com/secure/?size=small&ownerId=A12345&id=456","22x22":"https://url.url.com/secure/?size=medium&ownerId=A12345&id=456","44x44":"https://url.url.com/secure/?ownerId=A12345&id=456"},"displayName":"Name1,Lname2.","emailAddress":"name.lname@abcdef.com","key":"A12345","name":"A12345","self":"https://url.url.com/rest/api/2/user?username=A12345","timeZone":"India/Mumbai"}
  4. {"active":true,"urls":{"6x6":"https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456","4x4":"https://url.url.com/secure/?size=small&ownerId=C12345&id=456","22x22":"https://url.url.com/secure/?size=medium&ownerId=C12345&id=456","44x44":"https://url.url.com/secure/?ownerId=C12345&id=456"},"displayName":"Name1,Lname3.","emailAddress":"name.lname@abcdef.com","key":"C12345","name":"C12345","self":"https://url.url.com/rest/api/2/user?username=C12345","timeZone":"India/Mumbai"}
  5. ]

Output I am looking for is: list of values of the keys (nested)

  1. [
  2. [false,
  3. [https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123,
  4. https://url.url.com/secure/?size=small&ownerId=B12345&id=123,
  5. https://url.url.com/secure/?size=medium&ownerId=B12345&id=123,
  6. https://url.url.com/secure/?ownerId=B12345&id=123], Name1, Lname1, ,
  7. B12345, B12345, https://url.url.com/rest/api/2/user?username=B12345,
  8. India/Mumbai],
  9. [true,
  10. [https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456,
  11. https://url.url.com/secure/?size=small&ownerId=A12345&id=456,
  12. https://url.url.com/secure/?size=medium&ownerId=A12345&id=456,
  13. https://url.url.com/secure/?ownerId=A12345&id=456], Name1, Lname2.,
  14. name.lname@abcdef.com, A12345,
  15. A12345,https://url.url.com/rest/api/2/user?username=B12345,
  16. India/Mumbai] ,
  17. [true,
  18. [https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456,
  19. https://url.url.com/secure/?size=small&ownerId=C12345&id=456,
  20. https://url.url.com/secure/?size=medium&ownerId=C12345&id=456,
  21. https://url.url.com/secure/?ownerId=C12345&id=456], Name1, Lname3.,
  22. name.lname@abcdef.com, C12345,
  23. C12345,https://url.url.com/rest/api/2/user?username=B12345,
  24. India/Mumbai]
  25. ]

Query that I tried:

  1. SELECT
  2. CONCAT('[[',
  3. REPLACE(GET(flattened.value, 'active'), '"', '') , ', ',
  4. '[',
  5. listagg(CASE WHEN flattened_nested.value LIKE '%https%' THEN 'https:' || REPLACE(SPLIT_PART(flattened_nested.value, ':', 2), '"', '') ELSE NULL END,',')within group (order by null)
  6. ,']', ', ', --<== this causes issue while trying to parse urls values for each elements and to store them.
  7. REPLACE(GET(flattened.value, 'displayName'), '"', '') , ', ',
  8. REPLACE(GET(flattened.value, 'emailAddress'), '"', '') , ', ',
  9. REPLACE(GET(flattened.value, 'key'), '"', '') , ', ',
  10. REPLACE(GET(flattened.value, 'name'), '"', '') , ', ',
  11. REPLACE(GET(flattened.value, 'self'), '"', '') , ', ',
  12. REPLACE(GET(flattened.value, 'timeZone'), '"', ''),
  13. ']]') as output_column
  14. FROM snowflake_table SRC
  15. ,LATERAL FLATTEN(input=>SRC.json_values:"fields":"field_12345") AS flattened --<== the field that contains the above input.
  16. ,LATERAL FLATTEN(input=>flattened.value:urls) AS flattened_nested
  17. group by flattened.value
  18. ;
  • The below output that I get is aggregating all the 3 urls value for a particular input and stores it as comma separated. but, it doesn't take the other values like displayname, key,name, self etc for all 3 elements. It gives me only the first occurrence.

  • The other issue with this approach is if I need to include other 300+ columns, I have place everything in group by clause.

> [ [false,
> [https://url.url.com/secure/?size=xsmall&ownerId=B12345&id=123,
> https://url.url.com/secure/?size=small&ownerId=B12345&id=123,
> https://url.url.com/secure/?size=medium&ownerId=B12345&id=123,
> https://url.url.com/secure/?ownerId=B12345&id=123,https://url.url.com/secure/?size=xsmall&ownerId=A12345&id=456,
> https://url.url.com/secure/?size=small&ownerId=A12345&id=456,
> https://url.url.com/secure/?size=medium&ownerId=A12345&id=456,
> https://url.url.com/secure/?ownerId=A12345&id=456,https://url.url.com/secure/?size=xsmall&ownerId=C12345&id=456,
> https://url.url.com/secure/?size=small&ownerId=C12345&id=456,
> https://url.url.com/secure/?size=medium&ownerId=C12345&id=456,
> https://url.url.com/secure/?ownerId=C12345&id=456], Name1, Lname1, ,
> B12345, B12345, https://url.url.com/rest/api/2/user?username=B12345,
> India/Mumbai] ]

Can anyone please let me know how to get the desired output with any different approach irrespective of the number of input elements inside the list?

答案1

得分: 0

这不会生成嵌套的JSON,但我相信它应该能提取你想要的值,然后你可以从中形成JSON:

  1. SELECT
  2. t.value:active::BOOLEAN AS active
  3. , ARRAY_AGG(t.value:urls) WITHIN GROUP (ORDER BY seq) AS urls
  4. , SPLIT_PART(t.value:displayName::STRING, ',', 1) AS firstName
  5. , SPLIT_PART(t.value:displayName::STRING, ',', 2) AS lastName
  6. , t.value:emailAddress::STRING AS emailAddress
  7. , t.value:key::STRING AS key
  8. , t.value:name::STRING AS name
  9. , t.value:self::STRING AS self
  10. , t.value:timeZone::STRING AS timeZone
  11. FROM snowflake_table
  12. , LATERAL FLATTEN(input => json_values) t
  13. GROUP BY active, firstName, lastName, emailAddress, key, name, self, timeZone;

另一种使用JSON_PARSE的语法也是可能的:

  1. SELECT
  2. json_values:active::BOOLEAN AS active
  3. , ARRAY_AGG(json_values:urls) AS urls
  4. , json_values:displayName::STRING AS displayName
  5. , json_values:emailAddress::STRING AS emailAddress
  6. , json_values:key::STRING AS key
  7. , json_values:name::STRING AS name
  8. , json_values:self::STRING AS self
  9. , json_values:timeZone::STRING AS timeZone
  10. FROM snowflake_table
  11. , LATERAL JSON_PARSE(json_values) AS json_values
  12. GROUP BY active, displayName, emailAddress, key, name, self, timeZone;

你可能需要考虑两者,以便更容易应用最后一步并获得相对性能。使用JSON_VALUES更容易理解,但差异非常微小。

英文:

This won't produce the nested JSON, but it should I believe pick-out the values you want, and from there you should be able to form the JSON from it:

  1. SELECT
  2. t.value:active::BOOLEAN AS active
  3. , ARRAY_AGG(t.value:urls) WITHIN GROUP (ORDER BY seq) AS urls
  4. , SPLIT_PART(t.value:displayName::STRING, ',', 1) AS firstName
  5. , SPLIT_PART(t.value:displayName::STRING, ',', 2) AS lastName
  6. , t.value:emailAddress::STRING AS emailAddress
  7. , t.value:key::STRING AS key
  8. , t.value:name::STRING AS name
  9. , t.value:self::STRING AS self
  10. , t.value:timeZone::STRING AS timeZone
  11. FROM snowflake_table
  12. , LATERAL FLATTEN(input => json_values) t
  13. GROUP BY active, firstName, lastName, emailAddress, key, name, self, timeZone;

An alternative syntax via JSON_PARSE is also possible I feel:

  1. SELECT
  2. json_values:active::BOOLEAN AS active
  3. , ARRAY_AGG(json_values:urls) AS urls
  4. , json_values:displayName::STRING AS displayName
  5. , json_values:emailAddress::STRING AS emailAddress
  6. , json_values:key::STRING AS key
  7. , json_values:name::STRING AS name
  8. , json_values:self::STRING AS self
  9. , json_values:timeZone::STRING AS timeZone
  10. FROM snowflake_table
  11. , LATERAL JSON_PARSE(json_values) AS json_values
  12. GROUP BY active, displayName, emailAddress, key, name, self, timeZone;

You may want to consider both for ease of application of the final step and relative performance. It does feel like using JSON_VALUES is easier to understand but the difference in that is very a=marginal.

huangapple
  • 本文由 发表于 2023年6月2日 14:57:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定