正则表达式以停在第一个匹配的字符之一

huangapple go评论74阅读模式
英文:

Regular expression to stop at first match of one or the other character

问题

我正在Presto中解析一个JSON并在其上应用正则表达式。

{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"qbc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}
    
用到的正则表达式:
REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(
        Opfields,
        '$.value'
    ),
    '.*"abc":(.*?),',
    1
) abc

现在这对于abc有效,因为在abc值之后有一个逗号,但对于ghi则不起作用。对于ghi,我知道在这个示例中我可以使用}。但问题是,这些标量值(如abc、ghi)可以以不同的顺序出现,这种情况下可能会以逗号或}结尾。

我正在寻找一种无论顺序如何都能工作的表达式。实质上,我想要获取abc、def和ghi的值,无论它们在JSON中以什么顺序出现。

请问有人能提供一个好的方法吗?

我已经提到了我尝试过的正则表达式。

英文:

I am parsing a json in presto and applying regular expression on top of it.

{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"qbc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}

Regexp used :
REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(
        Opfields,
        '$.value'
    ),
    '.*"abc":(.*?),',
    1
) abc

Now this works for abc as there is a comma after abc value but does not work for ghi. For ghi I know in this example I could use } in this case. But the problem is that these scaler values like abc , ghi can occur in different order in which case it might end up with a comma or }.

I am looking for an expression which would work in every case no matter what the order is. Essentially I want the values for abc, def and ghi no matter in which order they appear in the json.

Can someone please advice a good way to do this.

I have mentioned the regexp I have tried

答案1

得分: 0

你不需要在这里使用正则表达式,根据显示的数据,你似乎只是有一个双重编码的 JSON(即 value 包含一个包含编码的 JSON 的 JSON 字符串),所以你可以再次解析它并将其处理为 JSON。使用 Presto(由于其有限的 JSON 路径支持),这可能会有点繁琐,涉及一些将其转换为映射和数组的操作:

-- 样例数据
with dataset(opfields) as(
    values ('{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"abc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}')
)

-- 查询
select JSON_EXTRACT(js1, '$.values.abc') abc,
       JSON_EXTRACT(js1, '$.values.ghi') ghi
from dataset,
    unnest (cast(JSON_PARSE(JSON_EXTRACT_SCALAR(opfields, '$.value')) as map(varchar, json))) as t(k, js),
    unnest (cast(JSON_EXTRACT(js, '$.payload') as array(json))) as t1(js1);

输出:

   Abc   | ghi
-----------------+-----------
 "not-abc" | "gh"

如果你实际上使用的是 Trino,你可以使用 json_query

select json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.abc' WITHOUT ARRAY WRAPPER) sensitivity,
       json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.ghi' WITHOUT ARRAY WRAPPER) certainty
from dataset;

<details>
<summary>英文:</summary>

You don&#39;t need to use regex here, based on the shown data you seem to have just a double encoded JSON here (i.e. `value` contains a JSON string which holds an encoded JSON) so you can parse it again and process it as JSON. With Presto (due to it&#39;s limited JSON path support) it will be a bit cumbersome involving some casts to maps and arrays:

```sql
-- sample data
with dataset(opfields) as(
    values (&#39;{&quot;value&quot;:&quot;{\&quot;123\&quot;:{\&quot;type\&quot;:\&quot;TAGGING\&quot;,\&quot;role\&quot;:\&quot;RESULT\&quot;,\&quot;payload\&quot;:[{\&quot;values\&quot;:{\&quot;abc\&quot;:\&quot;not-abc\&quot;,\&quot;def\&quot;:\&quot;yes\&quot;,\&quot;ghi\&quot;:\&quot;gh\&quot;}}]}}&quot;}&#39;)
)

-- query
select JSON_EXTRACT(js1, &#39;$.values.abc&#39;) abc,
       JSON_EXTRACT(js1, &#39;$.values.ghi&#39;) ghi
from dataset,
    unnest (cast(JSON_PARSE(JSON_EXTRACT_SCALAR(opfields, &#39;$.value&#39;)) as map(varchar, json))) as t(k, js),
    unnest (cast(JSON_EXTRACT(js, &#39;$.payload&#39;) as array(json))) as t1(js1);

Output:

   Abc   | ghi
-----------------+-----------
 &quot;not-abc&quot; | &quot;gh&quot;

If you actually are using Trino you can use json_query:

select json_query(JSON_EXTRACT_SCALAR(opfields, &#39;$.value&#39;), &#39;strict $.*.payload[*].values.abc&#39; WITHOUT ARRAY WRAPPER) sensitivity,
       json_query(JSON_EXTRACT_SCALAR(opfields, &#39;$.value&#39;), &#39;strict $.*.payload[*].values.ghi&#39; WITHOUT ARRAY WRAPPER) certainty
from dataset;

huangapple
  • 本文由 发表于 2023年6月19日 07:22:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76502859.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定