如何在Java中将字符串拆分为句子,同时保留所有空格?

huangapple go评论67阅读模式
英文:

How do I split a string into sentences while maintaining all the spaces in Java?

问题

我尝试了以下正则表达式:

sentences = sb.toString().split("(?<=[a-z])*\\.\\s*");

我正在使用一个字符串构建器 sb,将其转换为字符串,然后使用 split 函数。正则表达式检查在'.'之前的0个或多个字符以及在'.'之后的0个或多个空格。

然而,它对以下输入不起作用:

Hello World. Shipped to U.S on Friday.We are here .Good input

但我需要保留"We are here"之前的空格。

期望的输出是:

Hello World
 Shipped to U.S on Friday
 We are here
Good input
英文:

I tried the following regex:

sentences = sb.toString().split(&quot;(?&lt;=[a-z])*\\.\\s*&quot;);

I am using a stringBuilder sb and converting it to string and then using a split function<br>
The regex checks for 0 or more characters before '.' and 0 or more spaces after the '.'<br>

However, it doesn't work for the following input<be>

Hello World. Shipped to U.S on Friday.We are here .Good input 

But I need to keep the space before We are here

Req Output

Hello World
 Shipped to U.S on Friday
 We are here
Good input

答案1

得分: 3

使用这个正则表达式:([^\.]+)(\.|$)*?
您可以阅读有关组匹配并在此处查看完整的匹配:https://regex101.com/r/yV9GES/5

编辑:已更新评论中的答案链接。

英文:

use this regex: ([^\.]+)(\.|$)*?
you can read about group matchers and see the full matches here : https://regex101.com/r/yV9GES/5

edit: updated the link for answer in the comment.

答案2

得分: 1

使用\\.拆分你的字符串,即在.上拆分。

演示:

import java.util.Arrays;

public class Main {
    public static void main(String[] args) {
        System.out.println(Arrays.toString("Hello World. We are here .Good input.".split("\\.")));
    }
}

输出:

[Hello World,  We are here , Good input]
英文:

Split your string using \\. i.e. on .

Demo:

import java.util.Arrays;

public class Main {
	public static void main(String[] args) {
		System.out.println(Arrays.toString(&quot;Hello World. We are here .Good input.&quot;.split(&quot;\\.&quot;)));
	}
}

Output:

[Hello World,  We are here , Good input]

答案3

得分: 1

为什么要使用正则表达式?

你可以简单地使用 indexOfsubstring

public List<String> splitOnDot(String input) {
    List<String> result = new ArrayList<>();
    int idx;
    while ((idx = input.indexOf('.')) != -1) {
        result.add(input.substring(0, idx));
        input = input.substring(idx + 1);
    }
    return result;
}

成功的测试:

@Test
public void test1() {
    assertThat(splitOnDot("Hello World. We are here .Good input.")).contains("Hello World", " We are here ", "Good input");
}
英文:

Why do you have to use a RegEx?

You can simply use indexOf and substring

  public List&lt;String&gt; splitOnDot(String input) {
    List&lt;String&gt; result = new ArrayList&lt;&gt;();
    int idx;
    while ((idx = input.indexOf(&#39;.&#39;)) != -1) {
      result.add(input.substring(0, idx));
      input = input.substring(idx + 1);
    }
    return result;
  }

Successful test:

@Test
  public void test1() {
    assertThat(splitOnDot(&quot;Hello World. We are here .Good input.&quot;)).contains(&quot;Hello World&quot;, &quot; We are here &quot;, &quot;Good input&quot;);
  }

huangapple
  • 本文由 发表于 2020年8月9日 02:40:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63319023.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定