Java正则表达式以双大括号外的破折号分割字符串

huangapple go评论71阅读模式
英文:

Java regular expression to split a string by dashes positioned outside of double curly braces

问题

I have some strings that I want to separate based on dashes but only if dashes exist outside of double curly braces like below:

{{abc-def}}-123-benefit-{{ghi}} should break into {{abc-def}}, 123, benefit, {{ghi}}

abc-{{123-def}}-benefit-{{ghi}} should break into abc, {{123-def}}, benefit, {{ghi}}

I am trying to use regex to do that, but not able to get it done since I am novice with Regex. Based on some of the answers I could research, I tried following piece, but didn't get expected output:

String regex = "-(?!\\{\\{[^\\{\\{\\}]*\\}\\})";
String newRegex = "-(?:[^,\\\"]|\\\"[^\\\"]*\\\")+-";
String abc = "{{abc-def}}-123-benefit-{{def}}";
String[] output1 = abc.split(regex);
String[] output2 = abc.split(newRegex);

Getting output1 as

["{{abc", "def}}", "123", "benefit", "{{def}}"]

and output2 as

["{{abc", "{{def}}"]

Expected output is

["{{abc-def}}", "123", "benefit", "{{def}}"]
英文:

I have some strings that I want to separate based on dashes but only if dashes exist outside of double curly braces like below:

{{abc-def}}-123-benefit-{{ghi}} should break into {{abc-def}}, 123, benefit, {{ghi}}

abc-{{123-def}}-benefit-{{ghi}} should break into abc, {{123-def}}, benefit, {{ghi}}

I am trying to use regex to do that, but not able to get it done since I am novice with Regex. Based on some of the answers I could research, I tried following piece, but didn't get expected output:

String regex = "-(?!\\{\\{^\\{\\{\\}\\}*\\}\\})";
String newRegex = "-(?:[^,\"]|\"[^\"]*\")+-";
String abc = "{{abc-def}}-123-benefit-{{def}}";
String[] output1 = abc.split(regex);
String[] output2 = abc.split(newRegex);

Getting output1 as

["{{abc", "def}}", "123", "benefit", "{{def}}"]

and output2 as

["{{abc", "{{def}}"]

Expected output is

["{{abc-def}}", "123", "benefit", "{{def}}"]

答案1

得分: 1

以下是已翻译的内容:

你可以匹配正则表达式

(?:\{\{.*?\}\}|[^{}\-]+)

演示

该表达式可以分解如下。

(?:         开始一个非捕获组
  \{\{      匹配文字
  .*?       懒惰地匹配零个或多个非行终止符的字符
  \}\}      匹配文字
|           或者
  [^{}\-]+  匹配一个或多个字符,而不是 '{', '}' 或 '-'
)           结束非捕获组
英文:

You could match the regular expression

(?:\{\{.*?\}\}|[^{}\-]+)

Demo

The expression could be broken down as follow.

(?:         begin a non-capture group
  \{\{      match literal
  .*?       match zero or more chars other than line terminators, lazily
  \}\}      match literal
|           or
  [^{}\-]+  match one or more characters other than '{', '}' or '-'
)           end non-capture group

答案2

得分: 1

Your regex will be simpler if you use matcher instead of split. Note that in your first regex, the ^ functions as a start of input anchor, so that will never match.

With matcher you can use this regex:

import java.util.*;
import java.util.regex.*;

// ...

String regex = "\\{\\{.*?\\}\\}|[^-]+";
String abc = "{{abc-def}}-123-benefit-{{def}}";
ArrayList<String> output = new ArrayList<>();
Matcher m = Pattern.compile(regex).matcher(abc);
while (m.find()) {
    output.add(m.group());
}
英文:

Your regex will be simpler if you use matcher instead of split. Note that in your first regex, the ^ functions as a start of input anchor, so that will never match.

With matcher you can use this regex:

\{\{.*?\}\}|[^-]+

In code:

import java.util.*;
import java.util.regex.*;

// ...

        String regex = &quot;\\{\\{.*?\\}\\}|[^-]+&quot;;
        String abc = &quot;{{abc-def}}-123-benefit-{{def}}&quot;;
        ArrayList&lt;String&gt; output = new ArrayList&lt;&gt;();
        Matcher m = Pattern.compile(regex).matcher(abc);
        while (m.find()) {
            output.add(m.group());
        }


</details>



# 答案3
**得分**: 0

这有点不是答案,但你花了一个小时来得到一个正则表达式的答案,而我只花了20分钟,包括一些轻微的测试和注释。这比我想象的要复杂一些,不过也许可以改进一下。

我添加了一些简短的注释来说明我在做什么。在`parse()`方法中添加一些Javadoc注释也会使它更易阅读。

如果你觉得有帮助,可以随意复制使用。

```java
package stackoverflow;

import java.util.ArrayList;
import java.util.List;

/**
 * @author Brenden
 */
public class ParseDashBrace {
   
   public static void main( String[] args ) {
      List<String> test1 = parse( "{{abc-def}}-123-benefit-{{ghi}}" );
      System.out.println( "Test1 " + test1 );
      List<String> test2 = parse( "abc-{{123-def}}-benefit-{{ghi}}" );
      System.println( "Test2 " + test2 );
   }
   
   static List<String> parse( String line ) {
      ArrayList<String> tokens = new ArrayList<>();
      boolean inBrace = false;
      boolean firstBrace = false;
      int start = 0; 
      int end = 0;
      // 解析字符串,以“-”和“{{}}”为分隔符
      for( ; end < line.length(); end++ ) {
         char c = line.charAt( end );
         if( !inBrace ) {
            // 如果我们不在括号内,我们寻找“-”
            if( c == '-' ) {
               if( start != end )
                  tokens.add( line.substring( start, end) );
               start = end+1;
            } else if( c == '{' ) {
               // 我们还需要知道何时找到括号并进入其中
               inBrace = true;
            }
         } else {
            if( c == '}' ) {
               if( firstBrace ) {
                  // 我们有一个首个括号,现在我们找到了另一个,总共两个括号
                  tokens.add( line.substring( start, end+1 ) );
                  start = end+1;
                  firstBrace = false;
                  inBrace = false;
               } else {
                  // 这只是第一个结束括号,标记它,等待下一个
                  firstBrace = true;
               }
            }
         }
      }
      if( start != end ) 
         tokens.add( line.substring( start, end ) );
      return tokens;
   }
}

如果需要更多的帮助,请随时告诉我。

英文:

This is kind of a non-answer, but it's taken you an hour to get a regex answer, and this took me 20 minutes including some light testing and comments. It's a bit more complicated than I thought, though perhaps that could be improved.

I've added some light comments to indicate what I'm doing. Adding some Javadoc to the parse() method would also make it more readable.

If you find it helpful, feel free to copy.

package stackoverflow;

import java.util.ArrayList;
import java.util.List;

/**
 * @author Brenden
 */
public class ParseDashBrace {
   
   public static void main( String[] args ) {
      List&lt;String&gt; test1 = parse( &quot;{{abc-def}}-123-benefit-{{ghi}}&quot; );
      System.out.println( &quot;Test1 &quot; + test1 );
      List&lt;String&gt; test2 = parse( &quot;abc-{{123-def}}-benefit-{{ghi}}&quot; );
      System.out.println( &quot;Test2 &quot; + test2 );
   }
   
   static List&lt;String&gt; parse( String line ) {
      ArrayList&lt;String&gt; tokens = new ArrayList&lt;&gt;();
      boolean inBrace = false;
      boolean firstBrace = false;
      int start = 0; 
      int end = 0;
      // parse the string, split on - and {{}}
      for( ; end &lt; line.length(); end++ ) {
         char c = line.charAt( end );
         if( !inBrace ) {
            // if we are not inside braces, then we look for &quot;-&quot;
            if( c == &#39;-&#39; ) {
               if( start != end )
                  tokens.add( line.substring( start, end) );
               start = end+1;
            } else if( c == &#39;{&#39; ) {
               // we also need to know when we find a brace and are inside
               inBrace = true;
            }
         } else {
            if( c == &#39;}&#39; ) {
               if( firstBrace ) {
                  // we had a first brace and now we found another, two braces total
                  tokens.add( line.substring( start, end+1 ) );
                  start = end+1;
                  firstBrace = false;
                  inBrace = false;
               } else {
                  // this is just the first end brace, flag it, wait for next one
                  firstBrace = true;
               }
            }
         }
      }
      if( start != end ) 
         tokens.add( line.substring( start, end ) );
      return tokens;
   }
}

答案4

得分: 0

替代方案:

使用的正则表达式:

"(?<!\\{\\{\\w{1,100})-"

查看正则表达式的上下文和测试范例,如下所示:

public static void main(String[] args) {
    String input1 = "{{abc-def}}-123-benefit-{{ghi}}";
    String input2 = "abc-{{123-def}}-benefit-{{ghi}}";

    String regex = "(?<!\\{\\{\\w{1,100})-";

    System.out.println("Input1 的结果: " + Arrays.asList(input1.split(regex)));
    System.out.println("Input2 的结果: " + Arrays.asList(input2.split(regex)));
}

输出结果:

Input1 的结果: [{{abc-def}}, 123, benefit, {{ghi}}]
Input2 的结果: [abc, {{123-def}}, benefit, {{ghi}}]

备注:

使用的模式:

(?<!X)	X,通过零宽度负向回顾断言

了解有关 Pattern 的更多信息,请参阅:https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html

其他测试正则表达式的工具:https://regex101.com/

英文:

Alternative:

Used regex:

&quot;(?&lt;!\\{\\{\\w{1,100})-&quot;

See regex in context and test-bench, below:

public static void main(String[] args) {
    String input1 = &quot;{{abc-def}}-123-benefit-{{ghi}}&quot;;
    String input2 = &quot;abc-{{123-def}}-benefit-{{ghi}}&quot;;

    String regex = &quot;(?&lt;!\\{\\{\\w{1,100})-&quot;;

    System.out.println(&quot;Result of input1: &quot; + Arrays.asList(input1.split(regex)));
    System.out.println(&quot;Result of input2: &quot; + Arrays.asList(input2.split(regex)));
}

Output:

Result of input1: [{{abc-def}}, 123, benefit, {{ghi}}]
Result of input2: [abc, {{123-def}}, benefit, {{ghi}}]

Comment:

Used Pattern:
(?&lt;!X)	X, via zero-width negative lookbehind

Read more about Pattern here: https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html

Other tools for testing regex: https://regex101.com/

huangapple
  • 本文由 发表于 2023年3月1日 09:50:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75598900.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定