英文:
Update a column in Database based on new and duplicate record
问题
我正在处理一个问题,其中我正在从文件夹中获取文件名列表,并将其存储到数据库表中。此过程将每小时运行一次,因此如果从文件夹中读取到任何重复的文件名,则在表中我不需要重复记录,只会更新旧记录,但如果有任何新内容,它将插入记录。
我正在使用Spring Data Jpa,我知道可以使用saveAll方法自动完成,但我所需的是,如果文件是重复的,则会更新表中的另一列“描述”,指示此记录已更新,但在插入新记录时,它会说明这是一条新记录。
我想知道在不使用任何循环的情况下,最有效的做法是什么。
英文:
I am working on a problem in which i am pulling list of file names from a folder and storing it into a database table,This process will be running every hour so what i need if there is any duplicate file names that got read from the folder then i don't need any duplicate records in the table it will just update the old record but if there is anything new then it will insert the record.
I am using Spring Data Jpa and i know it can be done automatically by using saveAll method but what i need is that if the file is a duplicate then it will update another column "Description" in the table which says this record got update but when it is inserting a new record it says its a new record.
I want to know what is the most efficient way of doing this without using any loop.
答案1
得分: 1
基本上,您有一个异步作业,这个异步作业存在于一个或多个应用程序实例的上下文中。您需要解决以下几个问题:
1)读取文件的作业只需在应用程序的一个实例上运行。为此,您应该使用 @Schedlock,可以在 Google 上搜索它。
2)在读取文件名后,您需要根据数据库对它们进行验证。有几种变体的处理方法:
A)测试每个文件将导致每个文件执行一次选择查询,这可能是不理想的。
B)您可以从数据库中选择所有现有文件,然后您的任务是将传入的文件分为两组 - 存在的文件和不存在的文件。另一个选择是选择所有现有文件。
C)如果文件数量太大,一次性无法有效地读取。您可以创建一个名为“Incoming files”的第二个表,然后将所有传入的文件持久化在那里,然后执行与“SAVED_FILES”的连接以找出已保存的文件。
英文:
Basily you have an async job and this async job exists in the context of 1 or more instances of the application. There are couple of problems you need to look after:
-
The job that is reading the files need to run only on one leg of the application. For this purpose you should use @Schedlock google it.
-
After you read the filenames you need to verify them against the DB. Couple of variants exist for this procedure:
A) Testing each file would cause 1 select query per file which may be undesirable.
B) You can select all existing files from your DB and then your job would be to divide the incoming file in two groups - files that exist and files that dont. Another option would be to select all existing file.
C) If the amount of filesis so big that you can not effectivly read it at once. You can create a second table "Incoming files" then you persist all incoming files there and then you performa JOIN with the "SAVED_FILES" in order to find out the already saved files.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论