英文:
Entity manager is Slow than Direct Insert
问题
以下是要翻译的内容:
"我有一个用例,我们要将大量行插入数据库。所以我正在检查我的插入性能。
这是实体管理器的实现:
try {
TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
long start = System.currentTimeMillis();
transactionTemplate.execute(status -> {
try {
int batchSize = 5000;
int counter = 0;
for (Data row : dataArray) {
entityManager.persist(row);
if ((counter + 1) % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
counter++;
}
return "DONE";
} catch (Exception e) {
throw new RuntimeException("写入时发生错误", e);
}
});
}
上述方法在插入约 50,000 行的数据集时需要大约 200 秒,而下面的方法只需要约 30 秒:
EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
int batchSize = 5000;
try (Connection connection = info.getDataSource().getConnection();
PreparedStatement statement = connection.prepareStatement(insertQuery);
) {
int counter = 0;
connection.setAutoCommit(false);
for (Data row : dataArray) {
statement.clearParameters();
statement.setObject(1, row.getUniqueIdentifier());
statement.setObject(2, row.getInsertionId());
statement.addBatch();
if ((counter + 1) % batchSize == 0) {
statement.executeBatch();
connection.commit();
statement.clearBatch();
}
counter++;
}
} catch (Exception e) {
throw new RuntimeException("写入时发生错误:", e);
}
为什么实体管理器方法速度较慢?"
请注意,这是上述内容的翻译部分,没有包含代码。
英文:
I have use case where we are inserting a very large number of rows into the DB. So I am checking my performance for inserts.
This is the entity manager implementation:
try {
TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
long start = System.currentTimeMillis();
transactionTemplate.execute(status -> {
try {
int batchSize = 5000;
int counter = 0;
for (Data row : dataArray) {
entityManager.persist(row);
if ( (counter + 1 ) % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
counter++;
}
return "DONE";
} catch (Exception e) {
throw new RuntimeException("Error while writing", e);
}
});
The above method is taking around 200 sec for inserting a data set of around 50,000 rows while below method is taking only around 30 seconds
EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
int batchSize =5000;
try (Connection connection = info.getDataSource().getConnection();
PreparedStatement statement = connection.prepareStatement(insertQuery);
) {
int counter = 0;
connection.setAutoCommit(false);
for (Data row : dataArray) {
statement.clearParameters();
statement.setObject(1, row.getUniqueIdentifier());
statement.setObject( 2, row.getInsertionId());
statement.addBatch();
if ((counter + 1) % batchSize == 0) {
statement.executeBatch();
connection.commit();
statement.clearBatch();
}
counter++;
}
} catch (Exception e) {
throw new RuntimeException("Error While Writing: ", e);
}
}
Why the entity manager method is slow?
答案1
得分: 0
关系型数据库管理系统(RDBMS)基于集合数学理论,并将数据视为数据集而不是逐个处理(迭代方法)。
每次运行插入行的查询时,这被转换为:
- 查询的语法分析
- 解析对象的全名,如果查询没有以SQL模式(dbo...)为前缀的名称
- 在系统表中验证表和列是否存在
- 验证用户是否具有在表上执行命令的权限
- 将查询文本转换为数学公式(代数)
- 简化公式
- 计算最佳查询执行计划
- 最后执行查询...
当使用逐行处理方法时,必须为每一行执行所有这些任务,除了一些可以缓存的任务(最佳执行计划)。
当执行批处理时,所有这些阶段仅运行一次。
英文:
RDBMS are based on set mathematic theory and treats data as sets of data not one by one (iterative approach).
Every time you run a query to INSERT a row, this is translated in :
- syntaxic analysis of the query
- resolving full name of the object if query does not have name prefixed by SQL schema (dbo...)
- verifying in system tables is the table(s) and the column(s) exists
- verifying if the user have the priviliege to execute the command on the table
- transforming the query text into mathematic formulae (algebra)
- simplifying the formlulae
- compute an optimal query execution plan
- and finally executing the query...
When using a row by row approach all those task must be done for every rows except for some that can be cached (optimal execution plan)
When executing a batch only one time all these phases are runs
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论