使用Presto SQL或Scala中的JSON_EXTRACT或JSON_EXTRACT_SCALAR

huangapple go评论96阅读模式
英文:

Using JSON_EXTRACT or JSON_EXTRACT_SCALAR in Presto SQL or Scala

问题

SELECT 
  id, 
  MAX(CAST(json_extract_scalar(json_unquote(json_extract(value, '$.78kfcX.Sent')), '$') AS UNSIGNED)) AS Sent
FROM 
  table
GROUP BY 
  id;
英文:
id value
123 {78kfcX={"Sent": 77, "Respond": 31, "NoResponse": 31}, 97Facz={"Sent": 45, "Respond": 31, "NoResponse": 31}}
333 {5mdzrZ={"Sent": 1, "Respond": 1, "NoResponset": 1}}

Given the table above, I am trying to extract the "Sent" value... In cases where there are multiple sent values then I want to take the max.

I have tried using json_extract, json_extract_scalar, json_parse and multiple other functions in SQL but nothing seems to work. I keep getting NULL values in all my attempts.

The expected outcome given the example above should be:

id Sent
123 77
333 1

I think one way to approach this is by first doing a CROSS JOIN UNNEST to split the value column by 78kfcX, 97Facz, 5mdzrZ ids. And then extracting the sent value from there and taking the max, grouped by the id column.

Attempted code:

SELECT 
id, 
json_extract_scalar(value, '$.Sent') AS 'sent'
FROM table

JSON_PARSE(value) returns the following error:
> Cannot convert value to JSON: '{78kfcX={"Sent": 77, "Respond": 31, "NoResponse": 31}, 97Facz={"Sent": 45, "Respond": 31, "NoResponse": 31}}'

答案1

得分: 1

原问题的答案
输入字符串本身不是JSON格式。因此,我们需要首先通过正则表达式提取出 {"Sent": 77, "Respond": 31, "NoResponse": 31},然后在其上应用 JSON_EXTRACT_SCALAR() 函数。

以下正则表达式模式可以从输入中提取出属性映射(Regex101验证):

\{".*?\}

通过上述所有技巧,在Presto中,您可以按以下步骤获取所需结果:

步骤1. 使用 regexp_extract_all() 函数将输入字符串中的所有属性映射提取到一个数组中。

步骤2. 对数组的每个元素应用 JSON_extract_scalar(),并提取出 Sent 部分。

步骤3. 使用 array_max() 函数获取您想要的最大值。

以下是在Presto中的查询:

SELECT
    id,
    APPAY_MAX(
         TRANSFORM(
             REGEXP_EXTRACT_ALL(value, ''\{".*?\}''),
             v -> CAST(JSON_EXTRACT_SCALAR(v, ''$.Sent'') AS INT)
         )
   ) AS sent
FROM
    input_table
id Sent
123 77
333 1

后续问题的答案
如果我们想要在数组中进行求和而不是取最大值,我们可以使用reduce()函数和lambda表达式来实现。

以下是查询:

SELECT
    id,
    REDUCE(
         TRANSFORM(
             REGEXP_EXTRACT_ALL(value, ''\{".*?\}''),
             v -> CAST(JSON_EXTRACT_SCALAR(v, ''$.Sent'') AS INT)
         ), 0, (s,x) -> s + x, s -> s
   ) AS sent
FROM
    input_table
id Sent
123 122
333 1
英文:

answer for original question
The input string itself is not in JSON format. Therefore, we need to first extract {"Sent": 77, "Respond": 31, "NoResponse": 31} out by regular expression, and then apply JSON_EXTRACT_SCALAR() function on it.

The following regex pattern can extract the attribute maps out from the input (Regex101 verification):

\{".*?\}

With alll above tricks, in Presto, you can get the result you want in the following steps:

Step 1. Use regexp_extract_all() function to extract all attribute maps out of the input string into an array.

Step 2. Apply JSON_extract_scalar() to each element of the array and extract Sent part out

Step 3. Use array_max() function to get the max value you want.

Here is the query in Presto:

SELECT
    id,
    APPAY_MAX(
         TRANSFORM(
             REGEXP_EXTRACT_ALL(value, '\{".*?\}'),
             v -> CAST(JSON_EXTRACT_SCALAR(v, '$.Sent') AS INT)
         )
   ) AS sent
FROM
    input_table
id Sent
123 77
333 1

answer for the follow-up
If we want to calculation sum instead of max from the array, we can use reduce() function and lamda expression to do that.

Here is the query:

SELECT
    id,
    REDUCE(
         TRANSFORM(
             REGEXP_EXTRACT_ALL(value, '\{".*?\}'),
             v -> CAST(JSON_EXTRACT_SCALAR(v, '$.Sent') AS INT)
         ), 0, (s,x) -> s + x, s -> s
   ) AS sent
FROM
    input_table
id Sent
123 122
333 1

huangapple
  • 本文由 发表于 2023年3月8日 14:42:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75670046.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定