Regex捕获空字符串以及期望的组,希望不捕获空字符串。

huangapple go评论74阅读模式
英文:

Regex capturing empty string along with the expected groups, want it to not to capture empty string

问题

I built a regex to capture the value from the pattern, where pattern is to identify the json and fetch value from it. But along with the expected groups, it is also capturing the empty strings in the group.

Regex:

(?<=((?i)(finInstKey)"):)["]?(.*?)(?=["|,|}])|(?<="((?i)finInstKey","value":)["]?)(.*?)(?=["|,|}])

input:

  1. {"finInstKey":500},{"name":"finInstKey","value":12345678900987654321}
  2. {finInstKey":"500"},{"name":"finInstKey","value":"12345678900987654321"}

for these inputs, input 2 also captures the empty string along with the expected values.

actual output:

500
12345678900987654321

500

12345678900987654321

expected output:

500
12345678900987654321

500

12345678900987654321

As of now, I have handled it manually in the Java code, but it would be nice if regex won't capture the empty strings.
what changes should I make in the regex to get expected output.

Mainly, I want this to replaceAll groups with masked value "****".

My piece of code:

public class RegexTester {
    private static final String regex = "(?<=((?i)(%s)\":))[\"]?(.*?)(?=[\"|,|}])|(?<=\"((?i)%s\",\"value\":)[\"]?)(.*?)(?=[\"|,|}])";

    public static void main(String[] args) {
        String field = "finInstKey";
        String input = "{\"finInstKey\":500},{\"name\":\"finInstKey\",\"value\":12345678900987654321}{finInstKey\":\"500\"},{\"name\":\"finInstKey\",\"value\":\"12345678900987654321\"}";
        try {
            Pattern pattern = Pattern.compile(String.format(regex, field, field));
            Matcher matcher = pattern.matcher(input);
//            System.out.println(matcher.replaceAll("****"));
            while (matcher.find()) {
                System.out.println(matcher.group());
            }
        } catch (Exception e) {
            System.err.println(e);
        }

    }

}
英文:

I built a regex to capture the value from the pattern, where pattern is to identify the json and fetch value from it. But along with the expected groups, it is also capturing the empty strings in the group.

Regex:

(?&lt;=((?i)(finInstKey)&quot;:)[&quot;]?)(.*?)(?=[&quot;|,|}])|(?&lt;=&quot;((?i)finInstKey&quot;,&quot;value&quot;:)[&quot;]?)(.*?)(?=[&quot;|,|}])

input:

  1. {"finInstKey":500},{"name":"finInstKey","value":12345678900987654321}
  2. {finInstKey":"500"},{"name":"finInstKey","value":"12345678900987654321"}

for these inputs, input 2 also captures the empty string along with the expected values.

actual output:

500
12345678900987654321

500

12345678900987654321

expected output:

500
12345678900987654321

500

12345678900987654321

As of now, I have handled it manually in the Java code, but it would be nice if regex won't capture the empty strings.
what changes should I make in the regex to get expected output.

Mainly, I want this to replaceAll groups with masked value "****".

My piece of code:

public class RegexTester {
    private static final String regex = &quot;(?&lt;=((?i)(%s)\&quot;:)[\&quot;]?)(.*?)(?=[\&quot;|,|}])|(?&lt;=\&quot;((?i)%s\&quot;,\&quot;value\&quot;:)[\&quot;]?)(.*?)(?=[\&quot;|,|}])&quot;;

    public static void main(String[] args) {
        String field = &quot;finInstKey&quot;;
        String input = &quot;{\&quot;finInstKey\&quot;:500},{\&quot;name\&quot;:\&quot;finInstKey\&quot;,\&quot;value\&quot;:12345678900987654321}{finInstKey\&quot;:\&quot;500\&quot;},{\&quot;name\&quot;:\&quot;finInstKey\&quot;,\&quot;value\&quot;:\&quot;12345678900987654321\&quot;}&quot;;
        try {
            Pattern pattern = Pattern.compile(String.format(regex, field, field));
            Matcher matcher = pattern.matcher(input);
//            System.out.println(matcher.replaceAll(&quot;****&quot;));
            while (matcher.find()) {
                System.out.println(matcher.group());
            }
        } catch (Exception e) {
            System.err.println(e);
        }

    }

}

答案1

得分: 3

使用JSON解析库来解析JSON可能会更容易,而不是使用正则表达式。
尝试使用https://github.com/google/gson中的.fromJSON方法。

如果您坚持使用正则表达式,也许可以研究一下正则表达式中的+符号,它表示"匹配一个或多个"。当正则表达式变得像您那样复杂时,阅读起来相当困难。

英文:

It'd probably be easier using a JSON parsing library to parse JSON, instead of regex.
Try the .fromJSON method from https://github.com/google/gson

If you insist on using regex, maybe look into the + symbol in regex, it means "match one or more". Regex is pretty difficult to read when it gets complicated like you have there.

答案2

得分: 0

finInstKey键没有用引号括起来,导致匹配结果为空。通过将模式更改为 &quot;finInstKey&quot;,您将允许它匹配这个输入并正确提取值。

像这样使用它

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        String field = &quot;finInstKey&quot;;
        String regex = &quot;\&quot;?&quot; + field + &quot;\&quot;?(\\s*:\\s*\&quot;?([^\&quot;,}]*)\&quot;?|\&quot;,\&quot;value\&quot;\\s*:\\s*\&quot;?([^\&quot;,}]*)\&quot;?)&quot;;

        String input = &quot;{\&quot;finInstKey\&quot;:500},{\&quot;name\&quot;:\&quot;finInstKey\&quot;,\&quot;value\&quot;:12345678900987654321}{finInstKey:\&quot;500\&quot;},{\&quot;name\&quot;:\&quot;finInstKey\&quot;,\&quot;value\&quot;:\&quot;12345678900987654321\&quot;}&quot;;

        Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(input);

        while (matcher.find()) {
            if (matcher.group(2) != null) {
                System.out.println(matcher.group(2));
            } else {
                System.out.println(matcher.group(3));
            }
        }
    }
}

这里是代码

英文:

The finInstKey key is not enclosed in quotes leading to empty matches. By changing the pattern to &quot;finInstKey&quot; you will allow it to match this input and correctly extract the value.

Use it like

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String field = &quot;finInstKey&quot;;
String regex = &quot;\&quot;?&quot; + field + &quot;\&quot;?(\\s*:\\s*\&quot;?([^\&quot;,}]*)\&quot;?|\&quot;,\&quot;value\&quot;\\s*:\\s*\&quot;?([^\&quot;,}]*)\&quot;?)&quot;;
String input = &quot;{\&quot;finInstKey\&quot;:500},{\&quot;name\&quot;:\&quot;finInstKey\&quot;,\&quot;value\&quot;:12345678900987654321}{finInstKey:\&quot;500\&quot;},{\&quot;name\&quot;:\&quot;finInstKey\&quot;,\&quot;value\&quot;:\&quot;12345678900987654321\&quot;}&quot;;
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
if (matcher.group(2) != null) {
System.out.println(matcher.group(2));
} else {
System.out.println(matcher.group(3));
}
}
}
}

here is the code

答案3

得分: 0

以下是翻译好的部分:

你可以使用以下模式。 捕获组为2和3。
考虑到文本值可能包含任何可能的分隔符,确定值的结尾并不容易。
确保你的数据会符合要求;这意味着它只是一系列数字。
尽管如此,我建议只使用一个_JSON_解析模块,_Google_的 _Gson_ 效果很好。
你的_JSON_字符串实际上是数组,所以只需将每个放在方括号中。
请注意,你的第二个示例中的_finInstKey_键缺少引号。
使用_Gson_,你可以利用_JsonParser_类来解析_values_。
输出
```none
finInstKeyA = 500
valueA = 12345678900987654321
finInstKeyB = 500
valueB = 12345678900987654321
英文:

You can use the following pattern.&nbsp; The capture groups are 2 and 3.
It's not easy to determine the end of the value, considering a text value may contain any of the possible delimiters.
Assure that your data will conform; this implies that it's just a series of numbers.

(?si)(\&quot;finInstKey\&quot;)\s*:\s*\&quot;?(.+?)\b.+?\&quot;name\&quot;\s*:\s*\1\s*,\s*\&quot;value\&quot;\s*:\s*\&quot;?(.+?)\b

Although, I recommend just using a JSON parsing module, Gson by Google works well.

You're JSON strings are actually arrays, so just place each within square brackets.

[
  {
    &quot;finInstKey&quot;: 500
  },
  {
    &quot;name&quot;: &quot;finInstKey&quot;,
    &quot;value&quot;: 12345678900987654321
  }
]

Note that your second example has a missing quotation mark for the finInstKey key.

[
  {
    &quot;finInstKey&quot;: &quot;500&quot;
  },
  {
    &quot;name&quot;: &quot;finInstKey&quot;,
    &quot;value&quot;: &quot;12345678900987654321&quot;
  }
]

With Gson you can utilize the JsonParser class to parse the values.

String stringA = &quot;[\n&quot; +
    &quot;  {\n&quot; +
    &quot;    \&quot;finInstKey\&quot;: 500\n&quot; +
    &quot;  },\n&quot; +
    &quot;  {\n&quot; +
    &quot;    \&quot;name\&quot;: \&quot;finInstKey\&quot;,\n&quot; +
    &quot;    \&quot;value\&quot;: 12345678900987654321\n&quot; +
    &quot;  }\n&quot; +
    &quot;]&quot;;
String stringB = &quot;[\n&quot; +
    &quot;  {\n&quot; +
    &quot;    \&quot;finInstKey\&quot;: \&quot;500\&quot;\n&quot; +
    &quot;  },\n&quot; +
    &quot;  {\n&quot; +
    &quot;    \&quot;name\&quot;: \&quot;finInstKey\&quot;,\n&quot; +
    &quot;    \&quot;value\&quot;: \&quot;12345678900987654321\&quot;\n&quot; +
    &quot;  }\n&quot; +
    &quot;]&quot;;

JsonArray arrayA = JsonParser.parseString(stringA).getAsJsonArray();
JsonObject objectA1 = arrayA.get(0).getAsJsonObject();
JsonElement elementA1 = objectA1.get(&quot;finInstKey&quot;);
int finInstKeyA = elementA1.getAsInt();
JsonObject objectA2 = arrayA.get(1).getAsJsonObject();
JsonElement elementA2 = objectA2.get(&quot;value&quot;);
BigInteger valueA = elementA2.getAsBigInteger();
System.out.println(&quot;finInstKeyA = &quot; + finInstKeyA);
System.out.println(&quot;valueA = &quot; + valueA);

JsonArray arrayB = JsonParser.parseString(stringB).getAsJsonArray();
JsonObject objectB1 = arrayB.get(0).getAsJsonObject();
JsonElement elementB1 = objectB1.get(&quot;finInstKey&quot;);
String finInstKeyB = elementB1.getAsString();
JsonObject objectB2 = arrayB.get(1).getAsJsonObject();
JsonElement elementB2 = objectB2.get(&quot;value&quot;);
String valueB = elementB2.getAsString();
System.out.println(&quot;finInstKeyB = &quot; + finInstKeyB);
System.out.println(&quot;valueB = &quot; + valueB);

Output

finInstKeyA = 500
valueA = 12345678900987654321
finInstKeyB = 500
valueB = 12345678900987654321

答案4

得分: 0

I think you use not correct regexp.

public static List<String> getData(String str, String field) {
    String regex = "(?:\"?" + field + "\"?:(\\d+))|(?:\"name\":\"" + field + "\",\"value\":\"?(\\d+))\"";
    Matcher matcher = Pattern.compile(regex).matcher(str);
    List<String> data = new ArrayList<>();

    while (matcher.find()) {
        data.add(Optional.ofNullable(matcher.group(1))
                         .orElseGet(() -> matcher.group(2)));
    }

    return data;
}

Output:

500
12345678900987654321
500
12345678900987654321

P.S. 我认为使用正则表达式解析 JSON 是一个战略性不好的想法。我建议您使用任何 JSON 解析器(如 Jackson、Gson 等)。

英文:

I think you use not correct regexp.

public static List&lt;String&gt; getData(String str, String field) {
    String regex = &quot;(?:\&quot;?&quot; + field + &quot;\&quot;?:\&quot;?(\\d+)\&quot;?)|(?:\&quot;name\&quot;:\&quot;&quot;
            + field + &quot;\&quot;,\&quot;value\&quot;:\&quot;?(\\d+)\&quot;?)&quot;;
    Matcher matcher = Pattern.compile(regex).matcher(str);
    List&lt;String&gt; data = new ArrayList&lt;&gt;();

    while (matcher.find()) {
        data.add(Optional.ofNullable(matcher.group(1))
                         .orElseGet(() -&gt; matcher.group(2)));
    }

    return data;
}

Output:

500
12345678900987654321

500

12345678900987654321

P.S. I think parsing json with regexpis a strategically bad idea. I recommend you to use any Json parser (Jackson, Gson, ...)

huangapple
  • 本文由 发表于 2023年7月20日 13:27:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76726907.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定