英文:
Regex that match words and words with apostrophes
问题
抱歉,如果这是一个重复的帖子。我有一个方法,应该解析文件中的所有单词。一个单词应该只包含字母 a-z
和撇号。以下是我的代码片段:
public void loadInput(File fileName) throws IOException {
try{
Scanner sc = new Scanner(fileName);
int numWords = 0;
while(sc.hasNext("[A-Za-z\\']+")) {
String word = sc.next().toLowerCase(); // 不区分大小写
numWords++;
System.out.println(word);
}
System.out.println("文本文件中的总单词数:" + numWords);
sc.close();
} catch (Exception e) {
System.out.println("发生了错误");
}
}
例如输入:
alice's conversations in it, 'and what is the use of a book,'
thought alice 'without pictures or conversation?'
它应该匹配所有单词,包括 alice's
,但不包括 'without
(应该只匹配单词 without
)。
英文:
Sorry if this is a duplicate post. I have a method that is supposed to parse all the words in a file. A word should only consist of letters a-z
and an apostrophe. Here's my code snippet:
public void loadInput(File fileName) throws IOException {
try{
Scanner sc = new Scanner(fileName);
int numWords = 0;
while(sc.hasNext("[A-Za-z\']+")) {
String word = sc.next().toLowerCase(); // case-insenstive
numWords++;
System.out.println(word);
}
System.out.println("Total words in text file: " + numWords);
sc.close();
} catch (Exception e) {
System.out.println("Error has occured");
}
}
As an example input:
alice's conversations in it, `and what is the use of a book,'
thought alice `without pictures or conversation?'
It should match all the words including alice's
but not 'without
(it should match only the word without
)
答案1
得分: 0
你需要一个负向后顾。
(?<!')[\w']+
,查找[\w']+
,前面不跟着'
。
请参阅https://regex101.com/r/ZfyerX/3。
英文:
You want a negative lookbehind.
(?<!')[\w']+
, find [\w']+
not preceed by a '
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论