英文:
How can I split paragraphs into proper sentences in java using split()?
问题
我想将段落拆分为句子。例如:“Mary had a little lamb. Its fleece was white.” 我想将它拆分为:
“Mary had a little lamb.”
“Its fleece was white.”
目前我尝试使用了 text.split("[.]"); 并获得了结果:
“Mary had a little lamb”(没有句号。我需要它)
“ Its fleece was white”(句子前有空格,仍然没有句号)
我打算做的是将这段话拆分成正确的句子并放入数组中。
String text = sc.nextLine();
String[] sentence = text.split("[.]");
请帮忙!
英文:
I want to split paragraphs into sentences. For eg: "Mary had a little lamb. Its fleece was white." i want to split it into:
"Mary had a little lamb."
"Its fleece was white."
currently i tried using text.split("[.]"); and got the result:
"Mary had a little lamb" (no full stop present. i need it)
" Its fleece was white" (space present before the sentence and still, no full stop)
what I intend to do is split this para into proper sentences and put it into array.
String text = sc.nextLine();
String[] sentence = text.split("[.]");
please help!
答案1
得分: 1
你只需要在字符串分割后添加句号“.”。
类似这样:
String[] splitString = theString.split("[.]");
for(String s : splitString){
s += ".";
}
大致就是这样。
英文:
you can just append the full stop '.' to the string after the splitting.
Somthing like:
String[] splitString = theString.split("[.]");
for(String s : splitString){
s += ".";
}
Something in that direction.
答案2
得分: 0
以下是翻译好的部分:
public class Main {
public static void main(String[] args) {
String paragraph = "Mary had a little lamb. Its fleece was white.";
String sentences[] = paragraph.split("[.]");
for (String sentence:sentences){
System.out.println(sentence);
}
}
}
英文:
Somethig like this should work:
public class Main {
public static void main(String[] args) {
String paragraph = "Mary had a little lamb. Its fleece was white.";
String sentences[] = paragraph.split("[.]");
for (String sentence:sentences){
System.out.println(sentence);
}
}
}
答案3
得分: 0
假设分句并非自然语言处理的最后一步,您应考虑使用类似于 OpenNLP 的自然语言处理(NLP)库。您可以通过 Gate 项目 提供的网络界面尝试使用 OpenNLP,他们已经将 英语处理流程作为网页提供。请务必使用“自定义注释”按钮以查看句子结构。
英文:
Assuming that there is at least a chance that sentence splitting is not the last bit of natural language processing required, you should consider using a natural language processing (NLP) library like OpenNLP. You can try out OpenNLP through a web interface thanks to the Gate project, who have made an English language processing pipeline available as a web page. Make sure to use the "Customise Annotations" button to get to see the sentence structure.
答案4
得分: -1
假设这个句子:
String sResult = "This is a test. This is a T.L.A. test.";
所以你最好尝试这个:
String sResult = "This is a test. This is a T.L.A. test.";
String[] sSentence = sResult.split("(?<=[a-z])\\.\\s+");
结果:
This is a test
This is a T.L.A. test.
请注意,有一些缩写词不以大写字母结尾,例如 "abbrev."、"Mr." 等等... 还有一些句子不以句号结尾!
英文:
assume this sentence:
String sResult = "This is a test. This is a T.L.A. test.";
so you'd better to try this
String sResult = "This is a test. This is a T.L.A. test.";
String[] sSentence = sResult.split("(?<=[a-z])\\.\\s+");
Result:
This is a test
This is a T.L.A. test.
Note that there are abbrevations that do not end with capital letters, such as abbrev., Mr., etc... And there are also sentences that don't end in periods!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论