如何分割字符串(基于各种分隔符),但不保留空白字符?

huangapple go评论71阅读模式
英文:

How to split a string (based on a variety of delimiters) but without keeping whitespace?

问题

你可以使用以下的Java代码来修改split表达式和正则表达式,以便将字符串分词但不保留任何空格:

List<String> tokens = Arrays.stream("((!A1)&amp;(B2|C3))".split("((?<=[!&()|])|(?=[!&()|]))"))
                          .filter(token -> !token.trim().isEmpty())
                          .collect(Collectors.toList());

这段代码使用split()函数将字符串分成多个令牌,然后使用filter()函数排除掉所有经过trim()处理后为空的令牌,从而移除了所有空格。最终,你将得到一个不包含空格的令牌列表。

英文:

I have Java strings which are boolean expressions with parentheses, &amp;, |, and ! as operators, and I want to split them into tokens. For example:

((!A1)&amp;(B2|C3)) should become &quot;(&quot;,&quot;(&quot;,&quot;!&quot;,&quot;A1&quot;,&quot;)&quot;,&quot;&amp;&quot;,&quot;(&quot;,&quot;B2&quot;,&quot;|&quot;,&quot;C3&quot;,&quot;)&quot;,&quot;)&quot;

Following this answer I found that I can use Java's String.split() with a regex that includes lookahead and lookbehind clauses:

List&lt;String&gt; tokens = &quot;((!A1)&amp;(B2|C3))&quot;.split(&quot;((?&lt;=[!&amp;()|])|(?=[!&amp;()|]))&quot;)

My only problem is that whitespace will be included in the list of tokens. For example if I were to write the expression as ( ( !A1 ) &amp; ( B2 | C3 ) ) then my split() would produce at least four strings like &quot; &quot; and there'd be padding around my variables (e.g. &quot; A1 &quot;).

How can I modify this split expression and regex to tokenize the string but not keep any of the witespace?

答案1

得分: 1

你可以使用以下正则表达式来匹配你想要的,而不是使用分割:

[!&amp;()]|[^!&amp;()\h]+

正则表达式详情:

- `[!&amp;()]`: 匹配 `!` 或 `&amp;` 或 `(` 或 `)`
- `|`: 或
- `[^!&amp;()\h]+`: 匹配除了 `!`、`&amp;`、`(`、`)` 和空白字符之外的任何字符

代码:

```java
final String regex = "[!&amp;()]|[^!&amp;()\\h]+";
final String string = "((!A1)&amp;( B2 | C3 ))";

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);

List<String> result = new ArrayList<>();
while (matcher.find()) {
    result.add(matcher.group(0));
}

System.out.println(result);

<details>
<summary>英文:</summary>

Instead of split you can use this this regex to match what you want:

    [!&amp;()]|[^!&amp;()\h]+

[RegEx Demo][1]


  [1]: http://[!&amp;()]%7C[%5E!&amp;()%5Ch]+

**RegEx Details:**

- `[!&amp;()]`: Match `!` or `&amp;` or `(` or `)`
- `|`: OR
- `[^!&amp;()\h]+`: Match any characters that is NOT `!`, `&amp;`, `(`, `)` and a whitespace

**Code:**

    final String regex = &quot;[!&amp;()]|[^!&amp;()\\h]+&quot;;
    final String string = &quot;((!A1)&amp;( B2 | C3 ))&quot;;
    
    final Pattern pattern = Pattern.compile(regex);
    final Matcher matcher = pattern.matcher(string);
    
    List&lt;String&gt; result = new ArrayList&lt;&gt;();
    while (matcher.find()) {
        result.add(matcher.group(0));
    }

    System.out.println(result);


</details>



huangapple
  • 本文由 发表于 2020年8月6日 00:33:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/63269591.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定