使用特定字符未包围时,通过空格拆分字符串

huangapple go评论65阅读模式
英文:

Split a string using space when not surrounded by specific characters

问题

我需要使用空格拆分字符串,但保留由特定字符包围的单词。
特定字符可以是`` ` ``, `*`或`**`。

让我举个例子:

    The `String class` represents character strings.
    All *string literals* in **Java programs**, such as **abc**

我希望得到这个结果:

    The
    `String class`
    represents
    character
    strings.
    All
    *string literals*
    in
    **Java programs**
    ,
    such
    as
    **abc**

如果我只有一种标记字符,我可以编写正则表达式来拆分我的输入字符串为部分。但不幸的是,我有多个标记。

这是我在代码中使用的正则表达式:`[^\s"]+|"[^"]*("|$)`。这只适用于一个标记时运行正常:

    String marker = "`";
    String data = "The `String class` represents character strings. All *string literals* in **Java programs**, such as **abc**...";

    String regexp = "[^\\s" + marker + "]+|" + marker + "[^" + marker + "]*(" + marker +"|$)";
    Pattern pattern = Pattern.compile(regexp);
    Matcher regexMatcher = pattern.matcher(data);

    while (regexMatcher.find()) {
        System.out.println(regexMatcher.group());
    }

输出:

    The
    `String class`
    ...
    *string
    literals*
    in
    **Java
    programs**,
    such
    as
    **abc**...

我尝试过将多个标记组合在一起,但以下解决方案不起作用:

    String marker = "`|\*"

我可以编写Java代码来完成这个任务,但我曾认为使用正则表达式可能更容易。但现在我不太确定了。
英文:

I need to split a string using space but keep together the words surrounded by a specific character.
The specific characters can be `, * or **.

Let me give an example:

The `String class` represents character strings.
All *string literals* in **Java programs**, such as **abc**

I want to have this result:

The
`String class`
represents
character
strings.
All
*string literals*
in
**Java programs**
,
such
as
**abc**

I am able to write regexp which split my input string to parts if I have only one kind of marker character. But unfortunately, I have multiply markers.

This is the regexp I use in my code: [^\s"]+|"[^"]*("|$). This works fine only with one marker:

String marker = "`";
String data = "The `String class` represents character strings. All *string literals* in **Java programs**, such as **abc**...";

String regexp = "[^\\s" + marker + "]+|" + marker + "[^" + marker + "]*(" + marker +"|$)";
Pattern pattern = Pattern.compile(regexp);
Matcher regexMatcher = pattern.matcher(data);

while (regexMatcher.find()) {
    System.out.println(regexMatcher.group());
}

Output:

The
`String class`
...
*string
literals*
in
**Java
programs**,
such
as
**abc**...

I have tried to stick multiply markers, but the following solution does not work:

String marker = "`|\*"

I can write java code to do this job, but I thought that using regexp can be easier. But I am not sure about it now.

答案1

得分: 1

你可以使用以下代码进行提取:

`[^`]*`|(\*{1,2}).*?|\S+

详见证明。此模式将匹配在反引号、单个或双个星号以及任何非空白块之间的字符串。

在Java代码中使用双反斜杠:

String regex = "`[^`]*`|(\\*{1,2}).*?\|\\S+";
英文:

You may extract them with

`[^`]*`|(\*{1,2}).*?|\S+

See proof. This pattern will match strings between backticks, single- or double asterisks, and any non-whitespace chunks.

Use double backslash in Java code:

String regex = "`[^`]*`|(\\*{1,2}).*?\|\\S+";

huangapple
  • 本文由 发表于 2020年5月4日 01:37:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/61578941.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定