问题

我正在处理一个问题，其中我正在从文件夹中获取文件名列表，并将其存储到数据库表中。此过程将每小时运行一次，因此如果从文件夹中读取到任何重复的文件名，则在表中我不需要重复记录，只会更新旧记录，但如果有任何新内容，它将插入记录。
我正在使用Spring Data Jpa，我知道可以使用saveAll方法自动完成，但我所需的是，如果文件是重复的，则会更新表中的另一列“描述”，指示此记录已更新，但在插入新记录时，它会说明这是一条新记录。

我想知道在不使用任何循环的情况下，最有效的做法是什么。

英文:

I am working on a problem in which i am pulling list of file names from a folder and storing it into a database table,This process will be running every hour so what i need if there is any duplicate file names that got read from the folder then i don't need any duplicate records in the table it will just update the old record but if there is anything new then it will insert the record.
I am using Spring Data Jpa and i know it can be done automatically by using saveAll method but what i need is that if the file is a duplicate then it will update another column "Description" in the table which says this record got update but when it is inserting a new record it says its a new record.

I want to know what is the most efficient way of doing this without using any loop.

答案1

得分: 1

基本上，您有一个异步作业，这个异步作业存在于一个或多个应用程序实例的上下文中。您需要解决以下几个问题：

1）读取文件的作业只需在应用程序的一个实例上运行。为此，您应该使用 @Schedlock，可以在 Google 上搜索它。
2）在读取文件名后，您需要根据数据库对它们进行验证。有几种变体的处理方法：

A）测试每个文件将导致每个文件执行一次选择查询，这可能是不理想的。

B）您可以从数据库中选择所有现有文件，然后您的任务是将传入的文件分为两组 - 存在的文件和不存在的文件。另一个选择是选择所有现有文件。

C）如果文件数量太大，一次性无法有效地读取。您可以创建一个名为“Incoming files”的第二个表，然后将所有传入的文件持久化在那里，然后执行与“SAVED_FILES”的连接以找出已保存的文件。

英文:

Basily you have an async job and this async job exists in the context of 1 or more instances of the application. There are couple of problems you need to look after:

The job that is reading the files need to run only on one leg of the application. For this purpose you should use @Schedlock google it.
After you read the filenames you need to verify them against the DB. Couple of variants exist for this procedure:

A) Testing each file would cause 1 select query per file which may be undesirable.

B) You can select all existing files from your DB and then your job would be to divide the incoming file in two groups - files that exist and files that dont. Another option would be to select all existing file.

C) If the amount of filesis so big that you can not effectivly read it at once. You can create a second table "Incoming files" then you persist all incoming files there and then you performa JOIN with the "SAVED_FILES" in order to find out the already saved files.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

更新数据库中基于新记录和重复记录的列。

问题

答案1

在Jackson中序列化路径时，移除和添加正斜杠。

如何在使用Eclipse生成一个.jar文件后加载图像？

It''s never a good idea for a public class to expose fields directly, but why it is less harmful if the fields are immutable?

Android没有这张表格，错误代码1 SQLITE_ERROR。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论