实体管理器比直接插入慢

huangapple go评论68阅读模式
英文:

Entity manager is Slow than Direct Insert

问题

以下是要翻译的内容:

"我有一个用例,我们要将大量行插入数据库。所以我正在检查我的插入性能。

这是实体管理器的实现

try {
    TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
    long start = System.currentTimeMillis();
    transactionTemplate.execute(status -> {
        try {
            int batchSize = 5000;
            int counter = 0;
           
            for (Data row : dataArray) {
                entityManager.persist(row);

                if ((counter + 1) % batchSize == 0) {
                    entityManager.flush();
                    entityManager.clear();
                }
                counter++;
            }
            
            return "DONE";
        } catch (Exception e) {
          
            throw new RuntimeException("写入时发生错误", e);

        }
    });
}

上述方法在插入约 50,000 行的数据集时需要大约 200 秒,而下面的方法只需要约 30 秒:

EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
int batchSize = 5000;

try (Connection connection = info.getDataSource().getConnection();
     PreparedStatement statement = connection.prepareStatement(insertQuery);
) {
    
    int counter = 0;
    connection.setAutoCommit(false);

    for (Data row : dataArray) {
        statement.clearParameters();

         statement.setObject(1, row.getUniqueIdentifier());
         statement.setObject(2, row.getInsertionId());
        
        statement.addBatch();
        if ((counter + 1) % batchSize == 0) {
            statement.executeBatch();
            connection.commit();
            statement.clearBatch();
        }
        counter++;
    }
    
} catch (Exception e) {
    throw new RuntimeException("写入时发生错误:", e);

}

为什么实体管理器方法速度较慢?"

请注意,这是上述内容的翻译部分,没有包含代码。

英文:

I have use case where we are inserting a very large number of rows into the DB. So I am checking my performance for inserts.

This is the entity manager implementation:

 try {
            TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
            long start = System.currentTimeMillis();
            transactionTemplate.execute(status -> {
                try {
                    int batchSize = 5000;
                    int counter = 0;
                   
                    for (Data row : dataArray) {
                        entityManager.persist(row);

                        if (  (counter + 1 ) % batchSize == 0) {
                            entityManager.flush();
                            entityManager.clear();
                        }
                        counter++;
                    }
                    
                    return "DONE";
                } catch (Exception e) {
              
                    throw new RuntimeException("Error while writing", e);

                }
            });

The above method is taking around 200 sec for inserting a data set of around 50,000 rows while below method is taking only around 30 seconds

     EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
            String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
            int batchSize =5000;
    
            try (Connection connection = info.getDataSource().getConnection();
                 PreparedStatement statement = connection.prepareStatement(insertQuery);
            ) {
                
                int counter = 0;
                connection.setAutoCommit(false);
    
                for (Data row : dataArray) {
                    statement.clearParameters();
    
                     statement.setObject(1, row.getUniqueIdentifier());
                     statement.setObject( 2, row.getInsertionId());
                    
                 
    
                    statement.addBatch();
                    if ((counter + 1) % batchSize == 0) {
                        statement.executeBatch();
                        connection.commit();
                        statement.clearBatch();
                    }
                    counter++;
                }
                
            } catch (Exception e) {
                throw new RuntimeException("Error While Writing: ", e);

}
    }

Why the entity manager method is slow?

答案1

得分: 0

关系型数据库管理系统(RDBMS)基于集合数学理论,并将数据视为数据集而不是逐个处理(迭代方法)。

每次运行插入行的查询时,这被转换为:

  • 查询的语法分析
  • 解析对象的全名,如果查询没有以SQL模式(dbo...)为前缀的名称
  • 在系统表中验证表和列是否存在
  • 验证用户是否具有在表上执行命令的权限
  • 将查询文本转换为数学公式(代数)
  • 简化公式
  • 计算最佳查询执行计划
  • 最后执行查询...

当使用逐行处理方法时,必须为每一行执行所有这些任务,除了一些可以缓存的任务(最佳执行计划)。

当执行批处理时,所有这些阶段仅运行一次。

英文:

RDBMS are based on set mathematic theory and treats data as sets of data not one by one (iterative approach).

Every time you run a query to INSERT a row, this is translated in :

  • syntaxic analysis of the query
  • resolving full name of the object if query does not have name prefixed by SQL schema (dbo...)
  • verifying in system tables is the table(s) and the column(s) exists
  • verifying if the user have the priviliege to execute the command on the table
  • transforming the query text into mathematic formulae (algebra)
  • simplifying the formlulae
  • compute an optimal query execution plan
  • and finally executing the query...

When using a row by row approach all those task must be done for every rows except for some that can be cached (optimal execution plan)

When executing a batch only one time all these phases are runs

huangapple
  • 本文由 发表于 2023年7月7日 03:03:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定