为什么使用 @ElementCollection 注解标记的字段不能进行 Hibernate 批量插入?

huangapple go评论93阅读模式
英文:

Why doesn't Hibernate batch inserts of fields annotated with @ElementCollection?

问题

@Entity
public class Student {

  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Integer id;

  @ElementCollection(targetClass = Language.class)
  @CollectionTable(name="student_languages", joinColumns=@JoinColumn(name="student_id"))
  private Set<Language> languages;

  @ElementCollection(targetClass = Module.class)
  @CollectionTable(name="student_modules", joinColumns=@JoinColumn(name="student_id"))
  private Set<Module> modules;

  @ElementCollection(targetClass = SeatPreference.class)
  @CollectionTable(name="student_seats", joinColumns=@JoinColumn(name="student_id"))
  private Set<SeatPreference> seatPreference;

[...]
}

I know that GenerationType.IDENTITY deactivates batching, but I thought that would be the case for the main entity only, not for the single properties too. I'm having to bulk import a few entities (~20k), each with a handful of properties, but Hibernate seems to be generating one insert for each property in the sets, making the import impossibly slow (between 10 and 20 inserts for each record).

I have now spent so long trying to make this faster that I'm considering just generating an SQL file that I can manually import into the database.

Is there no way to instruct Hibernate to batch insert the @ElementCollection fields? Am I doing something wrong?


<details>
<summary>英文:</summary>

I have an `@Entity` containing a few `@OneToMany` relationships, but since they consist of collections of `Enum`s, I&#39;m using `@ElementCollection`. The entity has an id that gets generated at the database level (MySQL).

Here is a small example I just made up that corresponds to the structure of my entity.

```java
@Entity
public class Student {

  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Integer id;

  @ElementCollection(targetClass = Language.class)
  @CollectionTable(name=&quot;student_languages&quot;, joinColumns=@JoinColumn(name=&quot;student_id&quot;))
  private Set&lt;Language&gt; languages;

  @ElementCollection(targetClass = Module.class)
  @CollectionTable(name=&quot;student_modules&quot;, joinColumns=@JoinColumn(name=&quot;student_id&quot;))
  private Set&lt;Module&gt; modules;

  @ElementCollection(targetClass = SeatPreference.class)
  @CollectionTable(name=&quot;student_seats&quot;, joinColumns=@JoinColumn(name=&quot;student_id&quot;))
  private Set&lt;SeatPreference&gt; seatPreference;

[...]
}

I know that GenerationType.IDENTITY deactivates batching, but I thought that would be the case for the main entity only, not for the single properties too. I'm havin to bulk import a few entities (~20k), each with a handful of properties, but Hibernate seems to be generating one insert for each property in the sets, making the import impossibly slow (between 10 and 20 inserts for each record).

I have now spent so long trying to make this faster, that I'm considering just generating an SQL file that I can manually import in the database.

Is there no way to instruct Hibernate to batch inserts the @ElementCollection fields? Am I doing something wrong?

答案1

得分: 2

基本上,似乎 Hibernate 不会对 @ElementCollection 批处理产生帮助,但是你可以使用 SQL 的 批量插入。看起来你在使用支持批量插入的 MySQL 数据库,并且如果你启用了 rewriteBatchedStatements 属性,其 JDBC 驱动可以自动将单独的插入语句修改/重写为一个批量语句。

因此在你的情况下,你需要告诉 Hibernate 启用批处理并对批量插入和更新进行排序。

hibernate.jdbc.batch_size=100
hibernate.order_inserts=true
hibernate.order_updates=true

这将确保在将数据插入数据库时,Hibernate 生成的插入语句将会批量执行,并且它们会被排序。

因此,Hibernate 生成的 SQL 将类似于:

insert into student_languages (student_id, languages) values (1,1)
insert into student_languages (student_id, languages) values (1,2)
insert into student_languages (student_id, languages) values (1,3)
insert into student_languages (student_id, languages) values (1,4)

接下来,你需要告诉 JDBC 驱动通过设置 rewriteBatchedStatements=true 将单独的插入语句重写为批量插入。

jdbc:mysql://db:3306/stack?useSSL=false&amp;rewriteBatchedStatements=true

这将指示驱动将插入语句重写为批量形式,因此上述几个 SQL 语句将被重写为类似于这样的形式:

insert into student_languages (student_id, languages) values (1,1),(1,2),(1,3),(1,4)

仅作为信息,如果你使用的是旧版本的 MySQL 驱动和 Hibernate,则此方法可能不起作用。

英文:

Basically, seem hibernate will not help with @ElementCollection batching but you can use the SQL bulk inserts.
Seems you are on MySQL which does support the bulk inserts and its JDBC driver can automatically modify / rewrite the individual insert statements into one bulk statement if you enable the rewriteBatchedStatements property.

So in your case what you need to do is tell hibernate to enable batching and order the batch inserts and updates.

hibernate.jdbc.batch_size=100
hibernate.order_inserts=true
hibernate.order_updates=true

This will ensure that when inserting the data into DB the inserts statements generated by Hibernate will be executed in a batch and they will be ordered.

So the SQL generated by Hibernate will be something like this:

insert into student_languages (student_id, languages) values (1,1)
insert into student_languages (student_id, languages) values (1,2)
insert into student_languages (student_id, languages) values (1,3)
insert into student_languages (student_id, languages) values (1,4)

Next, you will need to tell the JDBC driver to rewrite the individual inserts into the bulk insert by setting the rewriteBatchedStatements=true

jdbc:mysql://db:3306/stack?useSSL=false&amp;rewriteBatchedStatements=true

So this will instruct the driver to rewrite the inserts into bulk form, so the above several SQL statements will be rewritten into something like this

insert into student_languages (student_id, languages) values (1,1),(1,2),(1,3),(1,4)

Just as an info this may not work if you are using old versions of the MySQL driver and Hibernate.

答案2

得分: 1

我分别在MySQL和MariaDB中进行了测试,实际上Hibernate会对集合表执行批量插入操作。但这对肉眼是不可见的,你需要使用DataSource-Proxy来查看:

DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource,连接:3,时间:1,成功:True
类型:Prepared,批次:False,查询大小:1,批次大小:0
查询:["insert into student (name) values (?)"]
参数:[(Smith)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource,连接:3,时间:1,成功:True
类型:Prepared,批次:False,查询大小:1,批次大小:0
查询:["insert into student (name) values (?)"]
参数:[(Snow)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource,连接:3,时间:78,成功:True
类型:Prepared,批次:True,查询大小:1,批次大小:6
查询:["insert into student_languages (student_id, language) values (?, ?)"]
参数:[(6,2),(6,0),(6,1),(7,0),(7,4),(7,3)]
INFO  com.example.jpa.AddStudents - 已添加

SEQUENCE ID生成器被认为是Hibernate中最好的选择。它不会像TABLE生成器一样创建锁竞争,并且支持批处理。不幸的是,MySQL仍然不支持序列(但MariaDB支持)。

> 我做错了什么吗?

Hibernate针对数据库的小规模更改进行了优化。它维护着一级缓存,并且还支持一级缓存,但对于大规模操作来说,这只会阻碍性能。因此,的确,根据评论的建议,对于这个特定操作,你可能更好地使用JDBC或jOOQ。


我使用的是MySQL 8.0.3、MariaDB 10.5.13和Hibernate 5.6.3.Final。

英文:

I tested this both with MySQL and MariaDB and actually Hibernate does batch inserts into the collection table. But it's not visible to the naked eye, you have to use DataSource-Proxy to see it:

INFO  com.example.jpa.AddStudents - Adding students
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:[&quot;insert into student (name) values (?)&quot;]
Params:[(Smith)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:[&quot;insert into student (name) values (?)&quot;]
Params:[(Snow)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource, Connection:3, Time:78, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:6
Query:[&quot;insert into student_languages (student_id, language) values (?, ?)&quot;]
Params:[(6,2),(6,0),(6,1),(7,0),(7,4),(7,3)]
INFO  com.example.jpa.AddStudents - Added

The SEQUENCE ID generator is considered the best for Hibernate. It doesn't create lock contention as the TABLE generator does and allows for batching. It is unfortunate that MySQL doesn't support sequences still (MariaDB does).

> Am I doing something wrong?

Hibernate is optimized for small-scale changes in the database. It maintains a first-level cache and also supports a second-level cache which will only hinder performance for large-scale operations. Therefore, indeed, you might be better off using JDBC or jOOQ for this particular operation as was suggested in the comments.


I used MySQL 8.0.3, MariaDB 10.5.13 and Hibernate 5.6.3.Final.

huangapple
  • 本文由 发表于 2020年4月8日 22:03:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/61102607.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定