如何在Java中使用split()将段落分割成正确的句子?

huangapple go评论136阅读模式
英文:

How can I split paragraphs into proper sentences in java using split()?

问题

我想将段落拆分为句子。例如:“Mary had a little lamb. Its fleece was white.” 我想将它拆分为:
“Mary had a little lamb.”
“Its fleece was white.”

目前我尝试使用了 text.split("[.]"); 并获得了结果:
“Mary had a little lamb”(没有句号。我需要它)
“ Its fleece was white”(句子前有空格,仍然没有句号)

我打算做的是将这段话拆分成正确的句子并放入数组中。

String text = sc.nextLine();
String[] sentence = text.split("[.]");

请帮忙!

英文:

I want to split paragraphs into sentences. For eg: "Mary had a little lamb. Its fleece was white." i want to split it into:
"Mary had a little lamb."
"Its fleece was white."

currently i tried using text.split("[.]"); and got the result:
"Mary had a little lamb" (no full stop present. i need it)
" Its fleece was white" (space present before the sentence and still, no full stop)

what I intend to do is split this para into proper sentences and put it into array.

String text = sc.nextLine();
String[] sentence = text.split("[.]");

please help!

答案1

得分: 1

你只需要在字符串分割后添加句号“.”。

类似这样:

String[] splitString = theString.split("[.]");
for(String s : splitString){
    s += ".";
}

大致就是这样。

英文:

you can just append the full stop '.' to the string after the splitting.

Somthing like:

String[] splitString = theString.split("[.]");
for(String s : splitString){
    s += ".";
}

Something in that direction.

答案2

得分: 0

以下是翻译好的部分:

public class Main {
    public static void main(String[] args) {
        String paragraph = "Mary had a little lamb. Its fleece was white.";

        String sentences[] = paragraph.split("[.]");

        for (String sentence:sentences){
            System.out.println(sentence);
        }
    }
}
英文:

Somethig like this should work:

public class Main {
    public static void main(String[] args) {
        String paragraph = "Mary had a little lamb. Its fleece was white.";

        String sentences[] = paragraph.split("[.]");

        for (String sentence:sentences){
            System.out.println(sentence);
        }
    }

}

答案3

得分: 0

假设分句并非自然语言处理的最后一步,您应考虑使用类似于 OpenNLP 的自然语言处理(NLP)库。您可以通过 Gate 项目 提供的网络界面尝试使用 OpenNLP,他们已经将 英语处理流程作为网页提供。请务必使用“自定义注释”按钮以查看句子结构。

英文:

Assuming that there is at least a chance that sentence splitting is not the last bit of natural language processing required, you should consider using a natural language processing (NLP) library like OpenNLP. You can try out OpenNLP through a web interface thanks to the Gate project, who have made an English language processing pipeline available as a web page. Make sure to use the "Customise Annotations" button to get to see the sentence structure.

答案4

得分: -1

假设这个句子:

String sResult = "This is a test. This is a T.L.A. test.";

所以你最好尝试这个:

String sResult = "This is a test. This is a T.L.A. test.";
String[] sSentence = sResult.split("(?<=[a-z])\\.\\s+");

结果:

This is a test
This is a T.L.A. test.

请注意,有一些缩写词不以大写字母结尾,例如 "abbrev."、"Mr." 等等... 还有一些句子不以句号结尾!

英文:

assume this sentence:

String sResult = &quot;This is a test. This is a T.L.A. test.&quot;;

so you'd better to try this

String sResult = &quot;This is a test. This is a T.L.A. test.&quot;;
String[] sSentence = sResult.split(&quot;(?&lt;=[a-z])\\.\\s+&quot;);

Result:

This is a test
This is a T.L.A. test.

Note that there are abbrevations that do not end with capital letters, such as abbrev., Mr., etc... And there are also sentences that don't end in periods!

huangapple
  • 本文由 发表于 2020年8月18日 20:27:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/63468593.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定