英文:
Java RegExp Split String with saving delimiters
问题
好的,以下是代码部分的翻译:
So, I have a simple string that looks like this:
word1 word2! word3? word4; word5, word6
word7 //new line
!word8! word9 word10 word11 word12
And my desire is to split this string with saving whitespace and new line delimiters.
Right now I'm using a s.split() method with ```[\\s\\r\\n]``` expression as its argument and the output is:
[word1, word2!, word3?, word4;, word5,, word6, , word7, , !word8!, word9, word10, word11, word12]
And I'm okay with a whitespaces not being saved. But what can I do with a ```\n``` being saved just as a whitespace?
UPD: I pass this string through RabbitMQ query. In Java it will look like this:
"word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12"
<details>
<summary>英文:</summary>
So, I have a simple string that looks like this:
word1 word2! word3? word4; word5, word6
word7 //new line
!word8! word9 word10 word11 word12
And my desire is to split this string with saving whitespace and new line delimiters.
Right now I'm using a s.split() method with ```[\\s\\r\\n]``` expression as its argument and the output is:
[word1, word2!, word3?, word4;, word5,, word6, , word7, , !word8!, word9, word10, word11, word12]
And I'm okay with a whitespaces not being saved. But what can I do with a ```\n``` being saved just as a whitespace?
UPD: I pass this string through RabbitMQ query. In Java it will look like this:
"word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12"
</details>
# 答案1
**得分**: 1
你可以使用 `\S+|\s+` 正则表达式来提取空白和非空白字符串(基本上将文本标记为空白和非空白文本块)。
请查看[Java演示][1]:
```java
import java.util.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String line = "word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12";
Pattern p = Pattern.compile("\\S+|\\s+");
Matcher m = p.matcher(line);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
}
}
输出:
[word1, , word2!, , word3?, , word4;, , word5,, , word6,
, word7,
, !word8!, , word9, , word10, , word11, , word12]
其中换行符是实际的换行字符。
英文:
You can extract the whitespace and non-whitespace strings (and basically, tokenize the text into whitespace and non-whitespace text chunks) using the \S+|\s+
regex.
See the Java demo:
import java.util.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String line = "word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12";
Pattern p = Pattern.compile("\\S+|\\s+");
Matcher m = p.matcher(line);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
}
}
Output:
[word1, , word2!, , word3?, , word4;, , word5,, , word6,
, word7,
, !word8!, , word9, , word10, , word11, , word12]
where the line breaks are literal line break chars.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论