英文:
Regular expression to stop at first match of one or the other character
问题
我正在Presto中解析一个JSON并在其上应用正则表达式。
{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"qbc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}
用到的正则表达式:
REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(
Opfields,
'$.value'
),
'.*"abc":(.*?),',
1
) abc
现在这对于abc有效,因为在abc值之后有一个逗号,但对于ghi则不起作用。对于ghi,我知道在这个示例中我可以使用}。但问题是,这些标量值(如abc、ghi)可以以不同的顺序出现,这种情况下可能会以逗号或}结尾。
我正在寻找一种无论顺序如何都能工作的表达式。实质上,我想要获取abc、def和ghi的值,无论它们在JSON中以什么顺序出现。
请问有人能提供一个好的方法吗?
我已经提到了我尝试过的正则表达式。
英文:
I am parsing a json
in presto and applying regular expression on top of it.
{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"qbc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}
Regexp used :
REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(
Opfields,
'$.value'
),
'.*"abc":(.*?),',
1
) abc
Now this works for abc as there is a comma after abc value but does not work for ghi. For ghi I know in this example I could use } in this case. But the problem is that these scaler values like abc , ghi can occur in different order in which case it might end up with a comma or }.
I am looking for an expression which would work in every case no matter what the order is. Essentially I want the values for abc, def and ghi no matter in which order they appear in the json.
Can someone please advice a good way to do this.
I have mentioned the regexp I have tried
答案1
得分: 0
你不需要在这里使用正则表达式,根据显示的数据,你似乎只是有一个双重编码的 JSON(即 value
包含一个包含编码的 JSON 的 JSON 字符串),所以你可以再次解析它并将其处理为 JSON。使用 Presto(由于其有限的 JSON 路径支持),这可能会有点繁琐,涉及一些将其转换为映射和数组的操作:
-- 样例数据
with dataset(opfields) as(
values ('{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"abc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}')
)
-- 查询
select JSON_EXTRACT(js1, '$.values.abc') abc,
JSON_EXTRACT(js1, '$.values.ghi') ghi
from dataset,
unnest (cast(JSON_PARSE(JSON_EXTRACT_SCALAR(opfields, '$.value')) as map(varchar, json))) as t(k, js),
unnest (cast(JSON_EXTRACT(js, '$.payload') as array(json))) as t1(js1);
输出:
Abc | ghi
-----------------+-----------
"not-abc" | "gh"
如果你实际上使用的是 Trino,你可以使用 json_query
:
select json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.abc' WITHOUT ARRAY WRAPPER) sensitivity,
json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.ghi' WITHOUT ARRAY WRAPPER) certainty
from dataset;
<details>
<summary>英文:</summary>
You don't need to use regex here, based on the shown data you seem to have just a double encoded JSON here (i.e. `value` contains a JSON string which holds an encoded JSON) so you can parse it again and process it as JSON. With Presto (due to it's limited JSON path support) it will be a bit cumbersome involving some casts to maps and arrays:
```sql
-- sample data
with dataset(opfields) as(
values ('{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"abc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}')
)
-- query
select JSON_EXTRACT(js1, '$.values.abc') abc,
JSON_EXTRACT(js1, '$.values.ghi') ghi
from dataset,
unnest (cast(JSON_PARSE(JSON_EXTRACT_SCALAR(opfields, '$.value')) as map(varchar, json))) as t(k, js),
unnest (cast(JSON_EXTRACT(js, '$.payload') as array(json))) as t1(js1);
Output:
Abc | ghi
-----------------+-----------
"not-abc" | "gh"
If you actually are using Trino you can use json_query
:
select json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.abc' WITHOUT ARRAY WRAPPER) sensitivity,
json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.ghi' WITHOUT ARRAY WRAPPER) certainty
from dataset;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论