在执行更新操作时使用多线程会对数据库端产生什么影响?

huangapple go评论77阅读模式
英文:

What happens on the DB side when I use multi-threading for update operations?

问题

上下文:

我使用一个专有的数据库(目标数据库),我不能透露数据库的名称(即使我透露了名称,您可能也不会知道)。

在这里,我通常需要使用Java来更新记录。(记录的数量从20000到40000不等)
每个更新事务在这个数据库中需要一到两秒的时间。因此,你可以看到执行时间将会是几个小时。这个数据库API中没有可用的批处理执行功能。因此,我考虑使用Java的多线程特性,而不是在单个进程中执行所有记录,我想为每100条记录创建一个线程。我们知道Java可以让这些线程并行运行。

但是,我想知道数据库如何处理这些共享同一连接的线程?我可以通过运行一个试验程序并比较时间间隔来找到这个问题的答案。我感觉这可能在某种程度上是误导性的。我知道您对数据库的信息不多。您可以假设数据库是MS SQL/MySQL,并回答这个问题。

如果除了多线程之外,还有其他的Java特性可以让这个程序执行得更快,请给我提供建议。

英文:

Context of My question:

I use a proprietary Database (target database) and I can not reveal the name of the DB (you may not know even If I reveal the name).

Here, I usually need to update the records using java. (The number of records vary from 20000 to 40000)
Each update transaction is taking one or two seconds for this DB. So, you see that the execution time would be in hours. There are no Batch execution functions are available for this Database API. For this, I am thinking to use Java multi-threaded feature, instead of executing all the records in single process I want to create a thread for every 100 records. We know that Java can make these threads run parallelly.

But, I want to know how does the DB process these threads sharing the same connection? I can find this by running a trail program and compare time intervals. I feel that it may be deceiving to some extent. I know that you don't have much information about the database. You can just answer this question assuming the DB as MS SQL/MySQL.

Please suggest me if there is any other feature in java I can utilize to make this program execute faster if not multi-threading.

答案1

得分: 1

不建议在单个连接中使用多个线程,您可以阅读这里关于这样做的问题 链接

如果您确实需要在单个连接中使用多个线程,我建议确保线程在事务内成功启动和停止。如果其中一个线程失败,您必须确保回滚更改。所以,首先获取计数,创建游标范围,然后对于每个范围启动一个将在该范围内执行的线程。要注意的一点是,在逐个执行分区后不要关闭连接,而是在事务完成并且数据库提交后再关闭连接。

如果您有使用Spring Framework的选项,可以查看 Spring Batch

Spring Batch提供了在处理大量记录时至关重要的可重用功能,包括日志记录/跟踪、事务管理、作业处理统计、作业重新启动、跳过和资源管理。它还提供了更高级的技术服务和功能,通过优化和分区技术可以实现极高容量和高性能的批处理作业。简单和复杂的高容量批处理作业都可以以高度可伸缩的方式利用该框架来处理大量的信息。

希望对您有所帮助。

英文:

It is not recommended to use single connection with multiple threads, you can read the pitfalls of doing so here.

If you really need to use a single connection with multiple threads, then I would suggest making sure threads start and stop successfully within a transaction. If one of them fails you have to make sure to rollback the changes. So, first get the count, make cursor ranges and for each range start a thread that will execute that on that range. One thing to look for is to not close the connection after executing the partitions individually, but to close it when the transaction is complete and the db is committed.

If you have an option to use Spring Framework, check out Spring Batch.

> Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs through optimization and partitioning techniques. Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.

Hope this helps.

huangapple
  • 本文由 发表于 2020年4月5日 22:20:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/61044072.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定