英文:
Spring Data JPA Is Too Slow
问题
最近我将我的应用切换到了Spring Boot 2。我依赖于Spring Data JPA来处理所有的事务,我注意到与我以前的配置相比,速度有很大的差异。存储大约1000个元素以前大约需要6秒,现在却需要超过25秒。我在Stack Overflow上看到了一些关于使用Data JPA进行批处理的帖子,但是这些方法都没有奏效。
让我展示一下这两个配置:
实体(两者共同的部分):
@Entity
@Table(name = "category")
public class CategoryDB implements Serializable {
// ... 省略其他部分 ...
@Column(name = "parent_category", length = 24)
@JoinColumn(name = "parent_category", referencedColumnName = "category_id")
private String parentID;
// Getters & Setters
}
旧的Repository(仅显示插入部分):
@Override
public Set<String> insert(Set<CategoryDB> element) {
Set<String> ids = new HashSet<>();
Transaction tx = session.beginTransaction();
for (CategoryDB category : element) {
String id = (String) session.save(category);
ids.add(id);
}
tx.commit();
return ids;
}
旧的Hibernate XML配置文件:
<property name="show_sql">true</property>
<property name="format_sql">true</property>
<!-- 连接信息 -->
<property name="hibernate.connection.driver_class">com.mysql.cj.jdbc.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
<!-- 数据库连接池信息 -->
<property name="connection_provider_class">org.hibernate.connection.C3P0ConnectionProvider</property>
<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">100</property>
<property name="hibernate.c3p0.timeout">300</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
新的Repository:
@Repository
public interface CategoryRepository extends JpaRepository<CategoryDB, String> {
@Query("SELECT cat.parentID FROM CategoryDB cat WHERE cat.category_id = :#{#category.category_id}")
String getParentID(@Param("category") CategoryDB category);
}
我在我的服务中使用了saveAll()
方法。
新的application.properties:
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.connection-timeout=6000
spring.datasource.hikari.maximum-pool-size=10
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.format_sql=true
spring.jpa.properties.hibernate.generate_statistics=true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQLDialect
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
新的统计信息:
24543605 纳秒用于获取1个JDBC连接;
0 纳秒用于释放0个JDBC连接;
136919170 纳秒用于准备942个JDBC语句;
5457451561 纳秒用于执行941个JDBC语句;
19985781508 纳秒用于执行19个JDBC批处理;
0 纳秒用于执行0次L2C放置操作;
0 纳秒用于执行0次L2C命中操作;
0 纳秒用于执行0次L2C未命中操作;
20256178886 纳秒用于执行3次刷新操作(刷新了总共2823个实体和0个集合);
0 纳秒用于执行0次部分刷新操作(刷新了总共0个实体和0个集合)
可能是我在Spring方面配置有误。这是一个巨大的性能差异,我陷入了死胡同。如果您对出现了什么问题有任何提示,将不胜感激。
英文:
I recently switched my app to Spring Boot 2. I rely on Spring Data JPA to handle all transactions and I noticed a huge speed difference between this and my old configuration. Storing around 1000 elements was being done in around 6s and now it's taking over 25 seconds. I have seen SO posts about batching with Data JPA but none of these worked.
Let me show you the 2 configurations:
The entity (common to both) :
@Entity
@Table(name = "category")
public class CategoryDB implements Serializable
{
private static final long serialVersionUID = -7047292240228252349L;
@Id
@Column(name = "category_id", length = 24)
private String category_id;
@Column(name = "category_name", length = 50)
private String name;
@Column(name = "category_plural_name", length = 50)
private String pluralName;
@Column(name = "url_icon", length = 200)
private String url;
@Column(name = "parent_category", length = 24)
@JoinColumn(name = "parent_category", referencedColumnName = "category_id")
private String parentID;
//Getters & Setters
}
Old Repository (showing an insert only) :
@Override
public Set<String> insert(Set<CategoryDB> element)
{
Set<String> ids = new HashSet<>();
Transaction tx = session.beginTransaction();
for (CategoryDB category : element)
{
String id = (String) session.save(category);
ids.add(id);
}
tx.commit();
return ids;
}
Old Hibernate XML Config File:
<property name="show_sql">true</property>
<property name="format_sql">true</property>
<!-- connection information -->
<property name="hibernate.connection.driver_class">com.mysql.cj.jdbc.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
<!-- database pooling information -->
<property name="connection_provider_class">org.hibernate.connection.C3P0ConnectionProvider</property>
<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">100</property>
<property name="hibernate.c3p0.timeout">300</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
Old Statistics:
18949156 nanoseconds spent acquiring 2 JDBC connections;
5025322 nanoseconds spent releasing 2 JDBC connections;
33116643 nanoseconds spent preparing 942 JDBC statements;
3185229893 nanoseconds spent executing 942 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3374152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
New Repository:
@Repository
public interface CategoryRepository extends JpaRepository<CategoryDB,String>
{
@Query("SELECT cat.parentID FROM CategoryDB cat WHERE cat.category_id = :#{#category.category_id}")
String getParentID(@Param("category") CategoryDB category);
}
And I'm using the saveAll()
in my service.
New application.properties:
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.connection-timeout=6000
spring.datasource.hikari.maximum-pool-size=10
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.format_sql=true
spring.jpa.properties.hibernate.generate_statistics = true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQLDialect
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
New Statistics:
24543605 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
136919170 nanoseconds spent preparing 942 JDBC statements;
5457451561 nanoseconds spent executing 941 JDBC statements;
19985781508 nanoseconds spent executing 19 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
20256178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
Probably , I'm misconfiguring something on behalf on Spring. This is a huge performance difference and I'm on a dead end. Any hints on what is going wrong here are very appreciated.
答案1
得分: 2
让我们合并统计数据,以便于进行比较。
旧行以o
为前缀,新行以n
为前缀。
计数为0的行将被忽略。
纳秒的测量数据被格式化,以便毫秒出现在一个空格之前。
o: 18 949156 纳秒用于获取 2 个 JDBC 连接;
n: 24 543605 纳秒用于获取 1 个 JDBC 连接;
o: 33 116643 纳秒用于准备 942 个 JDBC 语句;
n: 136 919170 纳秒用于准备 942 个 JDBC 语句;
o: 3185 229893 纳秒用于执行 942 个 JDBC 语句;
n: 5457 451561 纳秒用于执行 941 个 JDBC 语句; // 失去了约2秒
o: 0 纳秒用于执行 0 个 JDBC 批处理;
n: 19985 781508 纳秒用于执行 19 个 JDBC 批处理; // 失去了约20秒
o: 3374 152568 纳秒用于执行 1 次刷新(刷新了总共 941 个实体和 0 个集合);
n: 20256 178886 纳秒用于执行 3 次刷新(刷新了总共 2823 个实体和 0 个集合); // 失去了约20秒,处理了3倍数量的实体
o: 6485 纳秒用于执行 1 次部分刷新(刷新了总共 0 个实体和 0 个集合)
n: 0 纳秒用于执行 0 次部分刷新(刷新了总共 0 个实体和 0 个集合)
以下似乎是相关要点:
-
新版本有19个批处理,耗时20秒,在旧版本中根本不存在。
-
新版本有3次刷新,而不是1次,总共多花了20秒左右,大约是原来时间的6倍。这可能与批处理相同,因为它们很可能是这些刷新的一部分。
尽管批处理应该使事情更快,但有报告称它们会使事情变慢,尤其是在使用 MySql 时:https://stackoverflow.com/questions/20360574/why-springs-jdbctemplate-batchupdate-so-slow
这引出了一些你可以尝试/调查的事项:
- 禁用批处理,以便测试是否真的遇到了某种缓慢的批处理问题。
- 使用链接的 Stack Overflow 帖子来加速批处理。
- 记录实际执行的 SQL 语句,以找出差异。由于这将导致非常长的日志,可以尝试提取仅包含 SQL 语句的两个文件,并使用差异工具进行比较。
- 记录刷新操作,以获取为什么触发了额外的刷新的想法。
- 使用断点和调试器或额外的日志记录,找出哪些实体被刷新,以及为什么第二个变体中有更多的实体。
上述所有建议都在 JPA 上操作。
但是你的统计数据和问题内容表明你正在对单个或少数表进行简单的插入操作。
使用 JDBC(例如JdbcTemplate
)在这种情况下可能更高效,至少更容易理解。
英文:
Let's merge the statistics so they can be easily compared.
Old rows are prefixed with o
, new ones with n
.
Rows with a count of 0 are ignored.
Nanoseconds measurements are formatted so that milliseconds can are before a
.
o: 18 949156 nanoseconds spent acquiring 2 JDBC connections;
n: 24 543605 nanoseconds spent acquiring 1 JDBC connections;
o: 33 116643 nanoseconds spent preparing 942 JDBC statements;
n: 136 919170 nanoseconds spent preparing 942 JDBC statements;
o: 3185 229893 nanoseconds spent executing 942 JDBC statements;
n: 5457 451561 nanoseconds spent executing 941 JDBC statements; //loosing ~2sec
o: 0 nanoseconds spent executing 0 JDBC batches;
n: 19985 781508 nanoseconds spent executing 19 JDBC batches; // loosing ~20sec
o: 3374 152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
n: 20256 178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections); // loosing ~20sec, processing 3 times the entities
o: 6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
n: 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
The following seem to be the relevant points:
-
The new version has 19 batches which take 20sec which don't exist in the old version at all.
-
The new version has 3 flushes instead of 1, which take together 20 sec more or about 6 times as long. This is probably more or less the same extra time as the batches since they are most certainly part of these flushes.
Although batches are supposed to make things faster, there are reports where they make things slower, especially with MySql: https://stackoverflow.com/questions/20360574/why-springs-jdbctemplate-batchupdate-so-slow
This brings us to a couple of things you can try/investigate:
- Disable batching, in order to test if you are actually suffering from some kind of slow batch problem.
- Use the linked SO post in order to speed up batching.
- log the SQL statements that actually get executed in order to find the difference.
Since this will result in rather lengthy logs to manipulate, try extracting only the SQL statements in two files and comparing them with a diff tool. - log flushes in order to get ideas why extra flushes are triggered.
- use breakpoints and a debugger or extra logging to find out what entities are getting flushed and why you have way more entities in the second variant.
All the proposals above operate on JPA.
But your statistics and question content suggest that you are doing simple inserts in a single or few tables.
Doing this with on JDBC, e.g. with a JdbcTemplate
might be more efficient and at least easier to understand.
答案2
得分: -1
你可以直接使用JDBC模板,它比Data JPA快得多。
英文:
You can use jdbc template directly it is much fast than data jpa.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论