Java正则表达式拆分字符串并保存分隔符。

huangapple go评论71阅读模式
英文:

Java RegExp Split String with saving delimiters

问题

好的,以下是代码部分的翻译:

So, I have a simple string that looks like this:

word1 word2! word3? word4; word5, word6
word7 //new line
!word8! word9 word10 word11 word12

And my desire is to split this string with saving whitespace and new line delimiters.
Right now I'm using a s.split() method with ```[\\s\\r\\n]``` expression as its argument and the output is:

[word1, word2!, word3?, word4;, word5,, word6, , word7, , !word8!, word9, word10, word11, word12]

And I'm okay with a whitespaces not being saved. But what can I do with a ```\n``` being saved just as a whitespace?

UPD: I pass this string through RabbitMQ query. In Java it will look like this:

"word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12"


<details>
<summary>英文:</summary>

So, I have a simple string that looks like this:

word1 word2! word3? word4; word5, word6
word7 //new line
!word8! word9 word10 word11 word12

And my desire is to split this string with saving whitespace and new line delimiters.
Right now I&#39;m using a s.split() method with ```[\\s\\r\\n]``` expression as its argument and the output is:

[word1, word2!, word3?, word4;, word5,, word6, , word7, , !word8!, word9, word10, word11, word12]

And I&#39;m okay with a whitespaces not being saved. But what can I do with a ```\n``` being saved just as a whitespace?


UPD: I pass this string through RabbitMQ query. In Java it will look like this:

"word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12"


</details>


# 答案1
**得分**: 1

你可以使用 `\S+|\s+` 正则表达式来提取空白和非空白字符串(基本上将文本标记为空白和非空白文本块)。

请查看[Java演示][1]:

```java
import java.util.*;
import java.util.regex.*;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String line = "word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12";
        Pattern p = Pattern.compile("\\S+|\\s+");
        Matcher m = p.matcher(line);
        List<String> res = new ArrayList<>();
        while(m.find()) {
            res.add(m.group());
        }
        System.out.println(res);
    }
}

输出:

[word1,  , word2!,  , word3?,  , word4;,  , word5,,  , word6, 
, word7, 
, !word8!,  , word9,  , word10,  , word11,  , word12]

其中换行符是实际的换行字符。

英文:

You can extract the whitespace and non-whitespace strings (and basically, tokenize the text into whitespace and non-whitespace text chunks) using the \S+|\s+ regex.

See the Java demo:

import java.util.*;
import java.util.regex.*;
 
class Ideone
{
	public static void main (String[] args) throws java.lang.Exception
	{
		String line = &quot;word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12&quot;;
		Pattern p = Pattern.compile(&quot;\\S+|\\s+&quot;);
	    Matcher m = p.matcher(line);
	    List&lt;String&gt; res = new ArrayList&lt;&gt;();
	    while(m.find()) {
	    	res.add(m.group());
	    }
	    System.out.println(res);
	}
}

Output:

[word1,  , word2!,  , word3?,  , word4;,  , word5,,  , word6, 
, word7, 
, !word8!,  , word9,  , word10,  , word11,  , word12]

where the line breaks are literal line break chars.

huangapple
  • 本文由 发表于 2023年4月4日 17:58:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75928021.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定