英文:
Using JSON_EXTRACT or JSON_EXTRACT_SCALAR in Presto SQL or Scala
问题
SELECT
id,
MAX(CAST(json_extract_scalar(json_unquote(json_extract(value, '$.78kfcX.Sent')), '$') AS UNSIGNED)) AS Sent
FROM
table
GROUP BY
id;
英文:
id | value |
---|---|
123 | {78kfcX={"Sent": 77, "Respond": 31, "NoResponse": 31}, 97Facz={"Sent": 45, "Respond": 31, "NoResponse": 31}} |
333 | {5mdzrZ={"Sent": 1, "Respond": 1, "NoResponset": 1}} |
Given the table above, I am trying to extract the "Sent" value... In cases where there are multiple sent values then I want to take the max.
I have tried using json_extract, json_extract_scalar, json_parse and multiple other functions in SQL but nothing seems to work. I keep getting NULL values in all my attempts.
The expected outcome given the example above should be:
id | Sent |
---|---|
123 | 77 |
333 | 1 |
I think one way to approach this is by first doing a CROSS JOIN UNNEST to split the value column by 78kfcX, 97Facz, 5mdzrZ ids. And then extracting the sent value from there and taking the max, grouped by the id column.
Attempted code:
SELECT
id,
json_extract_scalar(value, '$.Sent') AS 'sent'
FROM table
JSON_PARSE(value) returns the following error:
> Cannot convert value to JSON: '{78kfcX={"Sent": 77, "Respond": 31, "NoResponse": 31}, 97Facz={"Sent": 45, "Respond": 31, "NoResponse": 31}}'
答案1
得分: 1
原问题的答案
输入字符串本身不是JSON格式。因此,我们需要首先通过正则表达式提取出 {"Sent": 77, "Respond": 31, "NoResponse": 31}
,然后在其上应用 JSON_EXTRACT_SCALAR() 函数。
以下正则表达式模式可以从输入中提取出属性映射(Regex101验证):
\{".*?\}
通过上述所有技巧,在Presto中,您可以按以下步骤获取所需结果:
步骤1. 使用 regexp_extract_all() 函数将输入字符串中的所有属性映射提取到一个数组中。
步骤2. 对数组的每个元素应用 JSON_extract_scalar(),并提取出 Sent
部分。
步骤3. 使用 array_max() 函数获取您想要的最大值。
以下是在Presto中的查询:
SELECT
id,
APPAY_MAX(
TRANSFORM(
REGEXP_EXTRACT_ALL(value, ''\{".*?\}''),
v -> CAST(JSON_EXTRACT_SCALAR(v, ''$.Sent'') AS INT)
)
) AS sent
FROM
input_table
id | Sent |
---|---|
123 | 77 |
333 | 1 |
后续问题的答案
如果我们想要在数组中进行求和而不是取最大值,我们可以使用reduce()函数和lambda表达式来实现。
以下是查询:
SELECT
id,
REDUCE(
TRANSFORM(
REGEXP_EXTRACT_ALL(value, ''\{".*?\}''),
v -> CAST(JSON_EXTRACT_SCALAR(v, ''$.Sent'') AS INT)
), 0, (s,x) -> s + x, s -> s
) AS sent
FROM
input_table
id | Sent |
---|---|
123 | 122 |
333 | 1 |
英文:
answer for original question
The input string itself is not in JSON format. Therefore, we need to first extract {"Sent": 77, "Respond": 31, "NoResponse": 31}
out by regular expression, and then apply JSON_EXTRACT_SCALAR() function on it.
The following regex pattern can extract the attribute maps out from the input (Regex101 verification):
\{".*?\}
With alll above tricks, in Presto, you can get the result you want in the following steps:
Step 1. Use regexp_extract_all() function to extract all attribute maps out of the input string into an array.
Step 2. Apply JSON_extract_scalar() to each element of the array and extract Sent
part out
Step 3. Use array_max() function to get the max value you want.
Here is the query in Presto:
SELECT
id,
APPAY_MAX(
TRANSFORM(
REGEXP_EXTRACT_ALL(value, '\{".*?\}'),
v -> CAST(JSON_EXTRACT_SCALAR(v, '$.Sent') AS INT)
)
) AS sent
FROM
input_table
id | Sent |
---|---|
123 | 77 |
333 | 1 |
answer for the follow-up
If we want to calculation sum instead of max from the array, we can use reduce() function and lamda expression to do that.
Here is the query:
SELECT
id,
REDUCE(
TRANSFORM(
REGEXP_EXTRACT_ALL(value, '\{".*?\}'),
v -> CAST(JSON_EXTRACT_SCALAR(v, '$.Sent') AS INT)
), 0, (s,x) -> s + x, s -> s
) AS sent
FROM
input_table
id | Sent |
---|---|
123 | 122 |
333 | 1 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论