英文:
Split a string using space when not surrounded by specific characters
问题
我需要使用空格拆分字符串,但保留由特定字符包围的单词。
特定字符可以是`` ` ``, `*`或`**`。
让我举个例子:
The `String class` represents character strings.
All *string literals* in **Java programs**, such as **abc**
我希望得到这个结果:
The
`String class`
represents
character
strings.
All
*string literals*
in
**Java programs**
,
such
as
**abc**
如果我只有一种标记字符,我可以编写正则表达式来拆分我的输入字符串为部分。但不幸的是,我有多个标记。
这是我在代码中使用的正则表达式:`[^\s"]+|"[^"]*("|$)`。这只适用于一个标记时运行正常:
String marker = "`";
String data = "The `String class` represents character strings. All *string literals* in **Java programs**, such as **abc**...";
String regexp = "[^\\s" + marker + "]+|" + marker + "[^" + marker + "]*(" + marker +"|$)";
Pattern pattern = Pattern.compile(regexp);
Matcher regexMatcher = pattern.matcher(data);
while (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
输出:
The
`String class`
...
*string
literals*
in
**Java
programs**,
such
as
**abc**...
我尝试过将多个标记组合在一起,但以下解决方案不起作用:
String marker = "`|\*"
我可以编写Java代码来完成这个任务,但我曾认为使用正则表达式可能更容易。但现在我不太确定了。
英文:
I need to split a string using space but keep together the words surrounded by a specific character.
The specific characters can be `
, *
or **
.
Let me give an example:
The `String class` represents character strings.
All *string literals* in **Java programs**, such as **abc**
I want to have this result:
The
`String class`
represents
character
strings.
All
*string literals*
in
**Java programs**
,
such
as
**abc**
I am able to write regexp which split my input string to parts if I have only one kind of marker character. But unfortunately, I have multiply markers.
This is the regexp I use in my code: [^\s"]+|"[^"]*("|$)
. This works fine only with one marker:
String marker = "`";
String data = "The `String class` represents character strings. All *string literals* in **Java programs**, such as **abc**...";
String regexp = "[^\\s" + marker + "]+|" + marker + "[^" + marker + "]*(" + marker +"|$)";
Pattern pattern = Pattern.compile(regexp);
Matcher regexMatcher = pattern.matcher(data);
while (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
Output:
The
`String class`
...
*string
literals*
in
**Java
programs**,
such
as
**abc**...
I have tried to stick multiply markers, but the following solution does not work:
String marker = "`|\*"
I can write java code to do this job, but I thought that using regexp can be easier. But I am not sure about it now.
答案1
得分: 1
你可以使用以下代码进行提取:
`[^`]*`|(\*{1,2}).*?|\S+
详见证明。此模式将匹配在反引号、单个或双个星号以及任何非空白块之间的字符串。
在Java代码中使用双反斜杠:
String regex = "`[^`]*`|(\\*{1,2}).*?\|\\S+";
英文:
You may extract them with
`[^`]*`|(\*{1,2}).*?|\S+
See proof. This pattern will match strings between backticks, single- or double asterisks, and any non-whitespace chunks.
Use double backslash in Java code:
String regex = "`[^`]*`|(\\*{1,2}).*?\|\\S+";
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论