从JSON中删除键和值的正则表达式

huangapple go评论119阅读模式
英文:

Regex to remove Key and value from JSON

问题

我有一个类似下面的JSON:

{"queueNumber": "123","UserId":[12,12,34],"cur":[{"objectName":"test","uniqueNumber":"123456"}]}

我想要移除匹配给定字段的键值对。

我尝试了下面的正则表达式,但它会在末尾添加一个逗号,导致在再次读取这个JSON时引发运行时异常。

"((queueNumber|name|uniqueNumber)[0-9])":(\s)?((".*?")|(\d+.\d+)|(\w+))[,]?

当前输出:

{"UserId":[12,12,34],"cur":[{"objectName":"test",}]}

预期输出:

{"UserId":[12,12,34],"cur":[{"objectName":"test"}]}

如何移除K,V键值对而不带有额外的逗号?

英文:

I have a JSON like below:

{"queueNumber": "123","UserId":[12,12,34],"cur":[{"objectName":"test","uniqueNumber":"123456"}]}

I want to remove the key-value pairs if it matches a given field.

I tried with the below regex but it adds a comma at the end which raises an exception at runtime while reading this JSON again.

"((queueNumber|name|uniqueNumber)[0-9]*)":(\s*)?((".*?")|(\d+.\d+)|(\w+))[,]?

current output:

{"UserId":[12,12,34],"cur":[{"objectName":"test",}]}

Expected output:

{"UserId":[12,12,34],"cur":[{"objectName":"test"}]}

How to remove K,V pairs without that extra comma at the end?

答案1

得分: 1

I can provide translations for the content you've shared:

正则表达式方法

我们需要考虑以下情况:

  1. 键值对是独立的:{"queueNumber": "123"}
  2. 键值对位于开头:{"queueNumber": "123","asdf": "1",...}
  3. 键值对位于中间:{,"asdf1": "1","queueNumber": "123","asdf2": "1",...}
  4. 键值对位于末尾:{...,"asdf1": "1","queueNumber": "123"}

难点在于不要删除太多逗号。我们可以将这些情况分为两类:

  1. 键值对以逗号开头
  2. 键值对不以逗号开头,但可能以逗号结尾
    "可能以逗号结尾" 可以被视为子类

如果这两类中的任何一类为真,我们就有了一个完全匹配。

类1的正则表达式:,\s*"((queueNumber|name|uniqueNumber)\d*)":\s*("([^"]*?)"|\d+.\d+|\w+)
类2的正则表达式:"((queueNumber|name|uniqueNumber)\d*)":\s*("([^"]*?)"|\d+.\d+|\w+)(\s*,)?

完整正则表达式(参见此链接):
(,\s*"((queueNumber|name|uniqueNumber)\d*)":\s*("([^"]*?)"|\d+.\d+|\w+))|("((queueNumber|name|uniqueNumber)\d*)":\s*("([^"]*?)"|\d+.\d+|\w+)(\s*,)?)

解释:

  • ,\s* - 匹配逗号后跟着任意空格0 - 无限次
  • " - 匹配字面字符 "
  • ((queueNumber|name|uniqueNumber)\d*) - 匹配 queueNumbernameuniqueNumber,后跟可选数字
  • ":\s* - 字面 ": 后跟可选空格
  • ("([^"]*?)"|\d+.\d+|\w+) - 匹配引号内的任何内容、十进制数字或字母单词
    ** [^"]*? - 匹配任何不包含 " 的字符串。懒惰匹配表示正则表达式尝试匹配尽可能少的字符。
  • (.a.)|(.b.) 匹配第一个(a)或第二个(b)正则表达式。如果第一个正则表达式匹配成功,则忽略第二个正则表达式。
  • (\s*,)? 可选匹配可选空格后跟逗号
    ** ? - 表示0到1次匹配

对正则表达式方法的改进:

  • 你不需要那么多分组 (...)。分组仅在替换或应用到子正则表达式的数量或其他逻辑时才是必要的。布尔 OR 运算符不适用于单个字符/匹配器,而是适用于之前和之后连接的所有字符。
  • \d[0-9] 相同
  • [,]? 可简化为 ,?
  • (\s*)? 可简化为 \s** 已经表示0-无限次匹配。因此,不需要使用 ? 限定符使这一部分成为可选项。
  • 你匹配值部分的方法((".*?"|\d+.\d+|\w+))可能会失控,因为有部分.*?。这是正确的,因为它是懒惰匹配。但是,如果正则表达式无法以你预期的方式找到匹配项,那么懒惰性将不会阻止它继续匹配尽可能多的字符。更好的方法是将 . 匹配器替换为 [^"]。这样,你就不会匹配字符串之外的内容。

JSON解析器方法

请查看以下库:

如果你不知道JSON的确切结构,我建议使用JsonPath来删除键。但是,你需要特别小心不要删除更多的内容。请参考以下示例:

Maven依赖:

<dependency>
  <groupId>com.jayway.jsonpath</groupId>
  <artifactId>json-path</artifactId>
  <version>2.8.0</version>
</dependency>

Java代码(需要至少Java 15):

import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import com.jayway.jsonpath.Option;
import com.jayway.jsonpath.Configuration;

public class App {
    public static void main( String[] args ) {
        String details = """{
            "queueNumber": "123",
            "UserId": [12, 12, 34],
            "cur": [
                {
                    "objectName": "test",
                    "uniqueNumber": "123456"
                }
            ]
        }""";

        DocumentContext parsedJson = JsonPath.parse(details, 
                // 异常在键不存在时抛出。对于更简单的演示,我们抑制了所有异常。但是,请注意在生产中如何使用这个实现!
                Configuration.defaultConfiguration().addOptions(Option.SUPPRESS_EXCEPTIONS));
        
        parsedJson.delete("$.queueNumber");
        parsedJson.delete("$..name");
        parsedJson.delete("$.cur[*].uniqueNumber");

        System.out.println(parsedJson.jsonString());
    }
}

现在,如果你确定给定JSON的结构,可以将路径更改为以下内容:

        parsedJson.delete("$.queueNumber");
        parsedJson.delete("$..name");
        parsedJson.delete("$.cur[*].uniqueNumber");

你只需要调整键 "name" 的路径。由于你的示例中缺少它,我不知道它通常

英文:

I second the comments, that you should probably use a JSON parser library for java. For the sake of completeness I developed both approaches:

regex approach

We need to consider the following cases:

  1. key-value-pair is alone: {&quot;queueNumber&quot;: &quot;123&quot;}
  2. key-value-pair is at the beginning: {&quot;queueNumber&quot;: &quot;123&quot;,&quot;asdf&quot;: &quot;1&quot;,...}
  3. key-value-pair is in the middle: {,&quot;asdf1&quot;: &quot;1&quot;,&quot;queueNumber&quot;: &quot;123&quot;,&quot;asdf2&quot;: &quot;1&quot;,...}
  4. key-value-pair is at the end: {...,&quot;asdf1&quot;: &quot;1&quot;,&quot;queueNumber&quot;: &quot;123&quot;}

The difficulty is to not to remove too many commas. We can categorize the cases in two classes:

  1. the key-value-pair starts with a comma
  2. the key-value-pair doesn't start with a comma, but may(!) end with a comma
    "may end with a comma" could be considered a sub-class

If either of these classes is true, we have a full match.

Regex for class 1: ,\s*&quot;(queueNumber|name|uniqueNumber)\d*&quot;:\s*(&quot;[^&quot;]*?&quot;|\d+.\d+|\w+) <br>
Regex for class 2: &quot;(queueNumber|name|uniqueNumber)\d*&quot;:\s*(&quot;[^&quot;]*?&quot;|\d+.\d+|\w+)(\s*,)?

Full Regex (see this link):
(,\s*&quot;(queueNumber|name|uniqueNumber)\d*&quot;:\s*(&quot;[^&quot;]*?&quot;|\d+.\d+|\w+))|(&quot;(queueNumber|name|uniqueNumber)\d*&quot;:\s*(&quot;[^&quot;]*?&quot;|\d+.\d+|\w+)(\s*,)?)

Explanation:

  • ,\s* - matches a comma followed by any whitespace 0 - unlimited times
  • &quot; - match the literal char &quot;
  • (queueNumber|name|uniqueNumber)\d* - matches either queueNumber, name or uniqueNumber, followed by optional numbers
  • &quot;:\s* - literal &quot;: followed by optional whitespaces
  • (&quot;[^&quot;]*?&quot;|\d+.\d+|\w+) - matches either anything inside of quotation marks, a decimal digit or a word <br>
    ** [^&quot;]*? - lazy match of any string, that does not contain &quot; ([^abc]-creates a word class with all chars, that are not a, b or c). Lazy means, the regex tries to match the least amount of chars possible.
  • (.a.)|(.b.) matches either the first(a) or the second(b) regex. If the first regex matched, the second regex is ignored.
  • (\s*,)? optionally matches optional white spaces followed by a comma <br>
    ** ? - means 0 to 1 matches

Improvements to your regex approach:

  • you didn't need so many groups (...). Grouping is only necessary for replacing or to apply a quantifier or other logic to a sub-regex. The boolean OR operator is not applied to single chars/matchers, but to the whole connected chars then come before and follow after.
  • \d is the same as [0-9]
  • [,]? can be simplified to ,?
  • (\s*)? can be simplified to \s* → the * already means 0-infinity matches. So there is no need to make this part optional with the ? quantifier
  • Your approach to match the value-part ((&quot;.*?&quot;|\d+.\d+|\w+)) can get out of hand, because of the part .*?. Your rightfully made it a lazy match. But if the regex doesn't find something in the way you intended to, the laziness will not stop it to go on as far as possible to get at least one match. A better approach is to replace the .-matcher with [^&quot;]. That way you will not match outside of the string.

JSON parser approach

Take a look at the following libraries:

If you don't know the exact structure of the json, I recommend using JsonPath to remove the keys. But you need to be extra careful not to remove more than you want to. See this example:

maven dependency:

&lt;dependency&gt;
  &lt;groupId&gt;com.jayway.jsonpath&lt;/groupId&gt;
  &lt;artifactId&gt;json-path&lt;/artifactId&gt;
  &lt;version&gt;2.8.0&lt;/version&gt;
&lt;/dependency&gt;

Java code (minimum Java 15 required):

import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import com.jayway.jsonpath.Option;
import com.jayway.jsonpath.Configuration;

public class App {
    public static void main( String[] args ) {
        String details = &quot;&quot;&quot;
            {&quot;queueNumber&quot;: &quot;123&quot;,&quot;UserId&quot;:[12,12,34],&quot;cur&quot;:[{&quot;objectName&quot;:&quot;test&quot;,&quot;uniqueNumber&quot;:&quot;123456&quot;}]}
                &quot;&quot;&quot;;

        DocumentContext parsedJson = JsonPath.parse(details, 
                // exceptions are thrown when a key is not present. For a simpler demo
                // we suppress all exceptions. Be cautious how to use this implementation
                // in production though!
                Configuration.defaultConfiguration().addOptions(Option.SUPPRESS_EXCEPTIONS));
        
        parsedJson.delete(&quot;$..queueNumber&quot;);
        parsedJson.delete(&quot;$..name&quot;);
        parsedJson.delete(&quot;$..uniqueNumber&quot;);

        System.out.println(parsedJson.jsonString());
    }
}

Now, if you are certain about the structure of the given json, you can change the path to the following:

        parsedJson.delete(&quot;$.queueNumber&quot;);
        parsedJson.delete(&quot;$..name&quot;);
        parsedJson.delete(&quot;$.cur[*].uniqueNumber&quot;);

You'd only need to adapt the path of the key "name". As it was missing in your example, I don't know where it usually occurs. If you get stuck using the library, check the example section of the readme file, it's well documented: https://github.com/json-path/JsonPath#path-examples

huangapple
  • 本文由 发表于 2023年5月29日 21:43:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76357894.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定