如何分割字符串(基于各种分隔符),但不保留空白字符?

huangapple go评论97阅读模式
英文:

How to split a string (based on a variety of delimiters) but without keeping whitespace?

问题

你可以使用以下的Java代码来修改split表达式和正则表达式,以便将字符串分词但不保留任何空格:

  1. List<String> tokens = Arrays.stream("((!A1)&amp;(B2|C3))".split("((?<=[!&()|])|(?=[!&()|]))"))
  2. .filter(token -> !token.trim().isEmpty())
  3. .collect(Collectors.toList());

这段代码使用split()函数将字符串分成多个令牌,然后使用filter()函数排除掉所有经过trim()处理后为空的令牌,从而移除了所有空格。最终,你将得到一个不包含空格的令牌列表。

英文:

I have Java strings which are boolean expressions with parentheses, &amp;, |, and ! as operators, and I want to split them into tokens. For example:

((!A1)&amp;(B2|C3)) should become &quot;(&quot;,&quot;(&quot;,&quot;!&quot;,&quot;A1&quot;,&quot;)&quot;,&quot;&amp;&quot;,&quot;(&quot;,&quot;B2&quot;,&quot;|&quot;,&quot;C3&quot;,&quot;)&quot;,&quot;)&quot;

Following this answer I found that I can use Java's String.split() with a regex that includes lookahead and lookbehind clauses:

  1. List&lt;String&gt; tokens = &quot;((!A1)&amp;(B2|C3))&quot;.split(&quot;((?&lt;=[!&amp;()|])|(?=[!&amp;()|]))&quot;)

My only problem is that whitespace will be included in the list of tokens. For example if I were to write the expression as ( ( !A1 ) &amp; ( B2 | C3 ) ) then my split() would produce at least four strings like &quot; &quot; and there'd be padding around my variables (e.g. &quot; A1 &quot;).

How can I modify this split expression and regex to tokenize the string but not keep any of the witespace?

答案1

得分: 1

  1. 你可以使用以下正则表达式来匹配你想要的,而不是使用分割:
  2. [!&amp;()]|[^!&amp;()\h]+
  3. 正则表达式详情:
  4. - `[!&amp;()]`: 匹配 `!` `&amp;` `(` `)`
  5. - `|`:
  6. - `[^!&amp;()\h]+`: 匹配除了 `!``&amp;``(``)` 和空白字符之外的任何字符
  7. 代码:
  8. ```java
  9. final String regex = "[!&amp;()]|[^!&amp;()\\h]+";
  10. final String string = "((!A1)&amp;( B2 | C3 ))";
  11. final Pattern pattern = Pattern.compile(regex);
  12. final Matcher matcher = pattern.matcher(string);
  13. List<String> result = new ArrayList<>();
  14. while (matcher.find()) {
  15. result.add(matcher.group(0));
  16. }
  17. System.out.println(result);
  1. <details>
  2. <summary>英文:</summary>
  3. Instead of split you can use this this regex to match what you want:
  4. [!&amp;()]|[^!&amp;()\h]+
  5. [RegEx Demo][1]
  6. [1]: http://[!&amp;()]%7C[%5E!&amp;()%5Ch]+
  7. **RegEx Details:**
  8. - `[!&amp;()]`: Match `!` or `&amp;` or `(` or `)`
  9. - `|`: OR
  10. - `[^!&amp;()\h]+`: Match any characters that is NOT `!`, `&amp;`, `(`, `)` and a whitespace
  11. **Code:**
  12. final String regex = &quot;[!&amp;()]|[^!&amp;()\\h]+&quot;;
  13. final String string = &quot;((!A1)&amp;( B2 | C3 ))&quot;;
  14. final Pattern pattern = Pattern.compile(regex);
  15. final Matcher matcher = pattern.matcher(string);
  16. List&lt;String&gt; result = new ArrayList&lt;&gt;();
  17. while (matcher.find()) {
  18. result.add(matcher.group(0));
  19. }
  20. System.out.println(result);
  21. </details>

huangapple
  • 本文由 发表于 2020年8月6日 00:33:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/63269591.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定