实体管理器比直接插入慢

huangapple go评论99阅读模式
英文:

Entity manager is Slow than Direct Insert

问题

以下是要翻译的内容:

"我有一个用例,我们要将大量行插入数据库。所以我正在检查我的插入性能。

这是实体管理器的实现

  1. try {
  2. TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
  3. long start = System.currentTimeMillis();
  4. transactionTemplate.execute(status -> {
  5. try {
  6. int batchSize = 5000;
  7. int counter = 0;
  8. for (Data row : dataArray) {
  9. entityManager.persist(row);
  10. if ((counter + 1) % batchSize == 0) {
  11. entityManager.flush();
  12. entityManager.clear();
  13. }
  14. counter++;
  15. }
  16. return "DONE";
  17. } catch (Exception e) {
  18. throw new RuntimeException("写入时发生错误", e);
  19. }
  20. });
  21. }

上述方法在插入约 50,000 行的数据集时需要大约 200 秒,而下面的方法只需要约 30 秒:

  1. EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
  2. String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
  3. int batchSize = 5000;
  4. try (Connection connection = info.getDataSource().getConnection();
  5. PreparedStatement statement = connection.prepareStatement(insertQuery);
  6. ) {
  7. int counter = 0;
  8. connection.setAutoCommit(false);
  9. for (Data row : dataArray) {
  10. statement.clearParameters();
  11. statement.setObject(1, row.getUniqueIdentifier());
  12. statement.setObject(2, row.getInsertionId());
  13. statement.addBatch();
  14. if ((counter + 1) % batchSize == 0) {
  15. statement.executeBatch();
  16. connection.commit();
  17. statement.clearBatch();
  18. }
  19. counter++;
  20. }
  21. } catch (Exception e) {
  22. throw new RuntimeException("写入时发生错误:", e);
  23. }

为什么实体管理器方法速度较慢?"

请注意,这是上述内容的翻译部分,没有包含代码。

英文:

I have use case where we are inserting a very large number of rows into the DB. So I am checking my performance for inserts.

This is the entity manager implementation:

  1. try {
  2. TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
  3. long start = System.currentTimeMillis();
  4. transactionTemplate.execute(status -> {
  5. try {
  6. int batchSize = 5000;
  7. int counter = 0;
  8. for (Data row : dataArray) {
  9. entityManager.persist(row);
  10. if ( (counter + 1 ) % batchSize == 0) {
  11. entityManager.flush();
  12. entityManager.clear();
  13. }
  14. counter++;
  15. }
  16. return "DONE";
  17. } catch (Exception e) {
  18. throw new RuntimeException("Error while writing", e);
  19. }
  20. });

The above method is taking around 200 sec for inserting a data set of around 50,000 rows while below method is taking only around 30 seconds

  1. EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
  2. String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
  3. int batchSize =5000;
  4. try (Connection connection = info.getDataSource().getConnection();
  5. PreparedStatement statement = connection.prepareStatement(insertQuery);
  6. ) {
  7. int counter = 0;
  8. connection.setAutoCommit(false);
  9. for (Data row : dataArray) {
  10. statement.clearParameters();
  11. statement.setObject(1, row.getUniqueIdentifier());
  12. statement.setObject( 2, row.getInsertionId());
  13. statement.addBatch();
  14. if ((counter + 1) % batchSize == 0) {
  15. statement.executeBatch();
  16. connection.commit();
  17. statement.clearBatch();
  18. }
  19. counter++;
  20. }
  21. } catch (Exception e) {
  22. throw new RuntimeException("Error While Writing: ", e);
  23. }
  24. }

Why the entity manager method is slow?

答案1

得分: 0

关系型数据库管理系统(RDBMS)基于集合数学理论,并将数据视为数据集而不是逐个处理(迭代方法)。

每次运行插入行的查询时,这被转换为:

  • 查询的语法分析
  • 解析对象的全名,如果查询没有以SQL模式(dbo...)为前缀的名称
  • 在系统表中验证表和列是否存在
  • 验证用户是否具有在表上执行命令的权限
  • 将查询文本转换为数学公式(代数)
  • 简化公式
  • 计算最佳查询执行计划
  • 最后执行查询...

当使用逐行处理方法时,必须为每一行执行所有这些任务,除了一些可以缓存的任务(最佳执行计划)。

当执行批处理时,所有这些阶段仅运行一次。

英文:

RDBMS are based on set mathematic theory and treats data as sets of data not one by one (iterative approach).

Every time you run a query to INSERT a row, this is translated in :

  • syntaxic analysis of the query
  • resolving full name of the object if query does not have name prefixed by SQL schema (dbo...)
  • verifying in system tables is the table(s) and the column(s) exists
  • verifying if the user have the priviliege to execute the command on the table
  • transforming the query text into mathematic formulae (algebra)
  • simplifying the formlulae
  • compute an optimal query execution plan
  • and finally executing the query...

When using a row by row approach all those task must be done for every rows except for some that can be cached (optimal execution plan)

When executing a batch only one time all these phases are runs

huangapple
  • 本文由 发表于 2023年7月7日 03:03:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定