英文:
Regex that match words and words with apostrophes
问题
抱歉,如果这是一个重复的帖子。我有一个方法,应该解析文件中的所有单词。一个单词应该只包含字母 a-z 和撇号。以下是我的代码片段:
public void loadInput(File fileName) throws IOException {
    try{
        Scanner sc = new Scanner(fileName);
        int numWords = 0;
        while(sc.hasNext("[A-Za-z\\']+")) {
            String word = sc.next().toLowerCase(); // 不区分大小写
            numWords++;
            System.out.println(word);
        }
        System.out.println("文本文件中的总单词数:" + numWords);
        sc.close();
    } catch (Exception e) {
        System.out.println("发生了错误");
    }
}
例如输入:
alice's conversations in it, 'and what is the use of a book,'
thought alice 'without pictures or conversation?'
它应该匹配所有单词,包括 alice's,但不包括 'without(应该只匹配单词 without)。
英文:
Sorry if this is a duplicate post. I have a method that is supposed to parse all the words in a file. A word should only consist of letters a-z and an apostrophe. Here's my code snippet:
public void loadInput(File fileName) throws IOException {
        try{
            Scanner sc = new Scanner(fileName);
            int numWords = 0;
            while(sc.hasNext("[A-Za-z\']+")) {
                String word = sc.next().toLowerCase(); // case-insenstive
                numWords++;
                System.out.println(word);
            }
            System.out.println("Total words in text file: " + numWords);
            sc.close();
        } catch (Exception e) {
            System.out.println("Error has occured");
        }
    }
As an example input:
alice's conversations in it, `and what is the use of a book,'
thought alice `without pictures or conversation?'
It should match all the words including alice's but not 'without (it should match only the word without)
答案1
得分: 0
你需要一个负向后顾。
(?<!')[\w']+,查找[\w']+,前面不跟着'。
请参阅https://regex101.com/r/ZfyerX/3。
英文:
You want a negative lookbehind.
(?<!')[\w']+, find [\w']+ not preceed by a '.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论