问题

我正在编写这段代码来从我的文本中去除停用词。

**问题 - 这段代码在去除停用词方面表现得很好，但当文本中存在像 ant、ide 这样的单词时，问题就出现了，因为它会将 ant 从 important、want 中移除，将 ide 从 side 中移除。但我不想将单词拆分为单个字母以去除停用词。**

String sCurrentLine;
List<String> stopWordsofwordnet = new ArrayList<>();
FileReader fr = new FileReader("G:\\stopwords.txt");
BufferedReader br = new BufferedReader(fr);
while ((sCurrentLine = br.readLine()) != null) {
    stopWordsofwordnet.add(sCurrentLine);
}

List<String> wordsList = new ArrayList<>();
String text = request.getParameter("textblock");
text = text.trim().replaceAll("[\\s,;]+", " ");
String[] words = text.split(" ");

for (String word : words) {
    wordsList.add(word);
}

// 从临时列表中移除停用词
for (int i = 0; i < wordsList.size(); i++) {
    for (int j = 0; j < stopWordsofwordnet.size(); j++) {
        if (stopWordsofwordnet.get(j).contains(wordsList.get(i).toLowerCase())) {
            out.println(wordsList.get(i) + "&nbsp;");
            wordsList.remove(i);
            i--;
            break;
        }
    }
}

for (String str : wordsList) {
    out.print(str + " ");
}

英文:

I am writing this piece of code to remove stop words from my text.

Problem - This code works perfectly for removing stopwords but the problem arises when words like ant, ide is present in my text as it removes both words ant and ide because ant is present in important, want and ide is present in side. But I don't want to split words into a letter to remove stopwords.

            String sCurrentLine;
List&lt;String&gt; stopWordsofwordnet=new ArrayList&lt;&gt;();
FileReader fr=new FileReader(&quot;G:\\stopwords.txt&quot;);
BufferedReader br= new BufferedReader(fr);
while ((sCurrentLine = br.readLine()) != null)
{
stopWordsofwordnet.add(sCurrentLine);
}
//out.println(&quot;&lt;br&gt;&quot;+stopWordsofwordnet);
List&lt;String&gt; wordsList = new ArrayList&lt;&gt;();
String text = request.getParameter(&quot;textblock&quot;);
text=text.trim().replaceAll(&quot;[\\s,;]+&quot;, &quot; &quot;);
String[] words = text.split(&quot; &quot;);
//            wordsList.addAll(Arrays.asList(words));
for (String word : words) {
wordsList.add(word);
}
out.println(&quot;&lt;br&gt;&quot;);
//remove stop words here from the temp list
for (int i = 0; i &lt; wordsList.size(); i++) 
{
// get the item as string
for (int j = 0; j &lt; stopWordsofwordnet.size(); j++) 
{
if (stopWordsofwordnet.get(j).contains(wordsList.get(i).toLowerCase())) 
{
out.println(wordsList.get(i)+&quot;&amp;nbsp;&quot;);
wordsList.remove(i);
i--;
break;
}
}
}
out.println(&quot;&lt;br&gt;&quot;);
for (String str : wordsList) {
out.print(str+&quot; &quot;);
}

答案1

得分: 0

你的代码过于复杂，可以简化为以下内容：

// 从文件中加载停用词
Set<String> stopWords = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
stopWords.addAll(Files.readAllLines(Paths.get("G:\\stopwords.txt")));

// 获取文本并将其分割成单词
String text = request.getParameter("textblock");
List<String> wordsList = new ArrayList<>(Arrays.asList(
		text.replaceAll("[\\s,;]+", " ").trim().split(" ")));

// 从单词列表中移除停用词
wordsList.removeAll(stopWords);

英文:

Your code is overly complicated, and can be reduced to this:

// Load stop words from file
Set&lt;String&gt; stopWords = new TreeSet&lt;&gt;(String.CASE_INSENSITIVE_ORDER);
stopWords.addAll(Files.readAllLines(Paths.get(&quot;G:\\stopwords.txt&quot;)));
// Get text and split into words
String text = request.getParameter(&quot;textblock&quot;);
List&lt;String&gt; wordsList = new ArrayList&lt;&gt;(Arrays.asList(
text.replaceAll(&quot;[\\s,;]+&quot;, &quot; &quot;).trim().split(&quot; &quot;)));
// Remove stop words from list of words
wordsList.removeAll(stopWords);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我不想通过将单词拆分为字母来删除停用词。

问题

答案1

Java中类仪器化后断点不触发？

如何更改Android应用程序的通知设置？

Azure Function 访问 Blob 时超时

困扰于如何在哪里实现if/else语句。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论