问题

我试图读取.inp文件中的每一行，对于每一行非重复的内容，将其写入新文件。目前我所拥有的代码存在的问题是，无论是否是前面的行的重复，所有行都被写入输出文件。我使用了一个Scanner对象来读取文件，以及一个BufferedReader/FileWriter对象来写入输出文件。

如何避免写入重复的内容？

String book = reader.nextLine();
boolean duplicate = false;
while (reader.hasNext() == true) {
    try {
        duplicate = reader.hasNext(book);
        if (duplicate == true) {
            book = reader.nextLine();
        } else {
            writer.write(book + "\n");
            book = reader.nextLine();
        }
    } catch (NoSuchElementException ex) {
        break;
    }
}

英文:

I'm attempting to read each line within an .inp file and for every non-duplicate, write the line to a new file. The issue I'm running into with the code I have so far is that all lines are written into the output file, regardless of if they're duplicates of previous line(s) or not. I'm using a Scanner object to read the file and a BufferedReader/FileWriter object to write the output file.

How do I avoid writing the duplicates?

String book = reader.nextLine();
boolean duplicate = false;
while (reader.hasNext() == true) {
    try {
        duplicate = reader.hasNext(book);
        if (duplicate == true) {
            book = reader.nextLine();
        } else {
            writer.write(book + &quot;\n&quot;);
            book = reader.nextLine();
        }
    } catch (NoSuchElementException ex) {
        break;
    }
}

答案1

得分: 1

根据情况：

如果重复的行是连续的，保持一个变量来存储前一行，并与之进行比较。
如果重复的行不是连续的，并且相对较少的短行，将已处理过的行存储在HashSet中，在处理一行时，检查集合是否已经contains()该行。
如果重复的行不是连续的，并且有相对较少但较长的行，不要将完整的行存储在HashSet中，而是存储每行的哈希值（例如SHA1），然后进行比较。
如果重复的行不是连续的，并且有很多长的行，将上述技术与某种形式的持久性数据库或数据存储相结合。

(*) 相对于可用内存

英文:

Depending on the situation:

If the duplicate lines are sequential, maintain a variable to store the previous line and compare against it.
If the duplicate lines are not sequential, and there are relatively (*) few short lines, store the lines you've already processed in a HashSet and upon processing a line check whether the set already contains() the line or not.
If the duplicate lines are not sequential, and there are relatively (*) few but long lines, instead of storing the complete lines in a HashSet, store a hash (e.g. SHA1) of each line, and compare against that.
If the duplicate lines are not sequential, and there are a lot of long lines, combine the techniques described above with some form of persistent database or data store.

(*) Relative to available memory

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有一种方法可以使用Java检查文件中的重复行？

问题

答案1

获取带有附加数据的流式响应主体

字符串未在输出中打印。

Different time & date from the database and what shown at my web page using Spring MVC

有办法将联系人导入到 Web 应用程序吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。