使用多个模式拆分字符串,其中第二个模式匹配第一个模式的较小部分。

huangapple go评论62阅读模式
英文:

Split string using multiple patterns, where second pattern matches smaller parts of the first

问题

Here's the translated code snippet:

for (String substr : "&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f&#167;ltest1 &#167;rtest2".split("((?<=(&#167;x(&#167;[0-9a-f]){6}))|(?<=&#167;[0-9a-z])|(?=&#167;[0-9a-z]))")) {
  System.out.println(substr);
}

Please note that the code itself remains in English since code keywords and syntax are typically written in English regardless of the programming language used.

英文:

I'm reading special "formatting codes" in a string and am trying to split the string so that I have those formatting codes and the string's text separated.

There are two "types" of formatting codes: "Encoded" hex colors: &#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f and other codes in the format of &#167;r.

Given the example string: &#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f&#167;ltest1 &#167;rtest2

I need the larger pattern split as a whole, and then the smaller ones. I can do what I want on those patterns separately, but am having trouble combining them into a single regex. Because the second pattern matches pieces of the first pattern, it's just splitting everything into smaller groups.

I'm trying this:

for (String substr : &quot;&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f&#167;ltest1 &#167;rtest2&quot;.split(&quot;((?&lt;=(&#167;x(&#167;[0-9a-f]){6}))|(?&lt;=&#167;[0-9a-z])|(?=&#167;[0-9a-z]))&quot;)) {
  System.out.println(substr);
}

My expected output is:

&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f
&#167;l
test1
&#167;r
test

My actual output is:

&#167;x
&#167;7
&#167;3
&#167;7
&#167;5
&#167;f
&#167;f
&#167;l
test1
&#167;r
test2

When I split the expressions up into different split tests, they work, they're just not working together.

答案1

得分: 2

Instead of splitting, you could just use this simplified regex for matching:

&#167;x(?:&#167;[0-9a-f]){6}|&#167;[0-9a-z]|[^&#167;\s]+

RegEx Demo

RegEx Details:

  • &#167;x(?:&#167;[0-9a-f]){6}: 匹配以 &#167;x 开头的文本,后面跟着 6 个十六进制字符
  • |: 或
  • &#167;[0-9a-z]: 匹配以 &#167; 开头的文本,后面跟着一个字母数字字符
  • |: 或
  • [^&#167;\s]+: 匹配 1 个或多个非空格且非 &#167; 字符

Code:

final String regex = "&quot;&#167;x(?:&#167;[0-9a-f]){6}|&#167;[0-9a-z]|[^&#167;\\s]+&quot;";
final String string = "&quot;&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f&#167;ltest1 &#167;rtest2&quot;";

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(0));
}
英文:

Instead of splitting, you could just use this simplified regex for matching:

&#167;x(?:&#167;[0-9a-f]){6}|&#167;[0-9a-z]|[^&#167;\s]+

RegEx Demo

RegEx Details:

  • &#167;x(?:&#167;[0-9a-f]){6}: Match text starting with &#167;x and 6 hex characters
  • |: OR
  • &#167;[0-9a-z]: Match text starting with &#167; and an alphanumeric
  • |: OR
  • [^&#167;\s]+: Match 1+ non-whitespace and non-&#167; characters

Code:

final String regex = &quot;&#167;x(?:&#167;[0-9a-f]){6}|&#167;[0-9a-z]|[^&#167;\\s]+&quot;;
final String string = &quot;&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f&#167;ltest1 &#167;rtest2&quot;;

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println( matcher.group(0) );
}

答案2

得分: 1

你可以使用以下正则表达式:

在这里查看它的工作原理

 ?((?:&#167;[^&#167;])(?=[^&#167;])|[^&#167; ]{2,})

它的工作原理如下:

  • ? 可选地匹配空格字符。
  • ((?:&#167;[^&#167;])(?=[^&#167;])|[^&#167; ]{2,}) 捕获以下之一:
    • (?:&#167;[^&#167;])(?=[^&#167;]) 匹配以下内容:
      • (?:&#167;[^&#167;]) 匹配 &#167; 后跟任何字符,但不包括 &#167;
      • (?=[^&#167;]) 前瞻,确保接下来的字符不是 &#167;(与 (?!&#167;) 相同,但更高效)。
    • [^&#167; ]{2,} 匹配任何字符,除了 &#167; 或空格,两次或更多。

通过替换为 \n$1

结果:

&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f
&#167;l
test1
&#167;r
test2
英文:

You can use the following regex:

See it working here

 ?((?:&#167;[^&#167;])(?=[^&#167;])|[^&#167; ]{2,})

How it works:

  • ? optionally match the space character
  • ((?:&#167;[^&#167;])(?=[^&#167;])|[^&#167; ]{2,}) capture either of the following:
    • (?:&#167;[^&#167;])(?=[^&#167;]) match the following:
      • (?:&#167;[^&#167;]) match &#167; followed by any character except &#167;
      • (?=[^&#167;]) lookahead ensuring what follows is not &#167; (same as (?!&#167;) but more efficient)
    • [^&#167; ]{2,} match any character except &#167; or space two or more times

With the substitution of \n$1

Result:

&#167;x&#167;7&#167;3&#167;7&#167;5&#167;f&#167;f
&#167;l
test1
&#167;r
test2

huangapple
  • 本文由 发表于 2020年8月10日 23:10:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/63342914.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定