2020年4月11日 01:39:10go评论172阅读模式

英文:

Why is Hibernate splitting my batch insert into 3 queries

问题

我目前正在尝试使用Hibernate实现批量插入。以下是我实施的几个方面：

1. 实体类

@Entity
@Table(name = "my_bean_table")
@Data
public class MyBean {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seqGen")
    @SequenceGenerator(name = "seqGen", sequenceName = "bean_c_seq", allocationSize=50)
    @Column(name = "my_bean_id")
    private Long id;

    @Column(name = "my_bean_name")
    private String name;

    @Column(name = "my_bean_age")
    private int age;

    public MyBean(String name, int age) {
        this.name = name;
        this.age = age;
    }
}

2. application.properties

Hibernate和数据源配置如下：

spring.datasource.url=jdbc:postgresql://{ip}:{port}/${db}?reWriteBatchedInserts=true&loggerLevel=TRACE&loggerFile=pgjdbc.log
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true

注意：&loggerLevel=TRACE&loggerFile=pgjdbc.log 用于调试目的。

3. 数据库中的元素

CREATE TABLE my_bean_table
(
    my_bean_id bigint NOT NULL DEFAULT nextval('my_bean_seq'::regclass),
    my_bean_name "char(100)" NOT NULL,
    my_bean_age smallint NOT NULL,
    CONSTRAINT bean_c_table_pkey PRIMARY KEY (bean_c_id)
)

CREATE SEQUENCE my_bean_seq
    INCREMENT 50
    START 1
    MINVALUE 1
    MAXVALUE 9223372036854775807
    CACHE 1;

编辑：添加ItemWriter

public class MyBeanWriter implements ItemWriter<MyBean> {

	private Logger logger = LoggerFactory.getLogger(MyBeanWriter.class);

	@Autowired
	MyBeanRepository repository;

	@Override
	public void write(List<? extends MyBean> items) throws Exception {
		repository.saveAll(items);
	}

}

commit-interval 也被设置为 50。

在jdbc驱动提供的日志文件中，我得到了以下行：

avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl execute
FINEST:   batch execute 3 queries, handler=org.postgresql.jdbc.BatchResultHandler@1317ac2c, maxRows=0, fetchSize=0, flags=5
...
FINEST:  FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9), ... (truncated for brevity) ... ($91, $92, $93),($94, $95, $96)",oids={23,1043,20,23,1043,20, ... (truncated for brevity) ... 23,1043,20,23,1043,20})
...

我的问题是：为什么批量查询会分成3个查询：

第一个查询：32个元素
第二个查询：16个元素
第三个查询：2个元素

注意：我尝试将批量大小设置为100和200，仍然得到3个不同的查询。

英文:

I'm currently tring to implement a batch insert using Hibernate. Here are the few things I implemented :

1. Entity

@Entity
@Table(name = &quot;my_bean_table&quot;)
@Data
public class MyBean {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = &quot;seqGen&quot;)
    @SequenceGenerator(name = &quot;seqGen&quot;, sequenceName = &quot;bean_c_seq&quot;, allocationSize=50)
    @Column(name = &quot;my_bean_id&quot;)
    private Long id;

    @Column(name = &quot;my_bean_name&quot;)
    private String name;

    @Column(name = &quot;my_bean_age&quot;)
    private int age;

    public MyBean(String name, int age) {
        this.name = name;
        this.age = age;
    }
}

2.application.properties

Hibernate and the datasource are configured this way :

spring.datasource.url=jdbc:postgresql://{ip}:{port}/${db}?reWriteBatchedInserts=true&amp;loggerLevel=TRACE&amp;loggerFile=pgjdbc.log
spring.jpa.show-sql=truespring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true

NB : &loggerLevel=TRACE&loggerFile=pgjdbc.log is for debugging purpose

3. Elements in my PostgresSQL Database

CREATE TABLE my_bean_table
(
    my_bean_id bigint NOT NULL DEFAULT nextval(&#39;my_bean_seq&#39;::regclass),
    my_bean_name &quot;char(100)&quot; NOT NULL,
    my_bean_age smallint NOT NULL,
    CONSTRAINT bean_c_table_pkey PRIMARY KEY (bean_c_id)
)

CREATE SEQUENCE my_bean_seq
    INCREMENT 50
    START 1
    MINVALUE 1
    MAXVALUE 9223372036854775807
    CACHE 1;

EDIT : Added ItemWriter

public class MyBeanWriter implements ItemWriter&lt;MyBean&gt; {

	private Logger logger = LoggerFactory.getLogger(MyBeanWriter .class);

	@Autowired
	MyBeanRepository repository;

	@Override
	public void write(List&lt;? extends BeanFluxC&gt; items) throws Exception {
		repository.saveAll(items);
	}

}

commit-interval is set to 50 as well.

In the log file provided by the jdbc driver I get the following lines :

avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl execute
FINEST:   batch execute 3 queries, handler=org.postgresql.jdbc.BatchResultHandler@1317ac2c, maxRows=0, fetchSize=0, flags=5
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST:  FE=&gt; Parse(stmt=null,query=&quot;insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9),($10, $11, $12),($13, $14, $15),($16, $17, $18),($19, $20, $21),($22, $23, $24),($25, $26, $27),($28, $29, $30),($31, $32, $33),($34, $35, $36),($37, $38, $39),($40, $41, $42),($43, $44, $45),($46, $47, $48),($49, $50, $51),($52, $53, $54),($55, $56, $57),($58, $59, $60),($61, $62, $63),($64, $65, $66),($67, $68, $69),($70, $71, $72),($73, $74, $75),($76, $77, $78),($79, $80, $81),($82, $83, $84),($85, $86, $87),($88, $89, $90),($91, $92, $93),($94, $95, $96)&quot;,oids={23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20})
...
FINEST:  FE=&gt; Execute(portal=null,limit=1)
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST:  FE=&gt; Parse(stmt=null,query=&quot;insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9),($10, $11, $12),($13, $14, $15),($16, $17, $18),($19, $20, $21),($22, $23, $24),($25, $26, $27),($28, $29, $30),($31, $32, $33),($34, $35, $36),($37, $38, $39),($40, $41, $42),($43, $44, $45),($46, $47, $48)&quot;,oids={23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20})
...
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST:  FE=&gt; Parse(stmt=null,query=&quot;insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6)&quot;,oids={23,1043,20,23,1043,20})

Here is my question : why is the batch query splitted into 3 queries :

first query : 32 elements
second query : 16 elements
third query : 2 elements

NB : I tried to set the batch size to 100 and 200, and I still got 3 different queries.

答案1

得分: 2

在调试过程中发现了PgPreparedStatement类及其transformQueriesAndParameters()方法：

 @Override
  protected void transformQueriesAndParameters() throws SQLException {
    ...
    BatchedQuery originalQuery = (BatchedQuery) preparedQuery.query;
    // 单个查询的绑定不能超过{@link Short#MAX_VALUE}，因此多值块的数量应受限制。
    // 通常，批量处理超过128行没有太多意义：在使用一个多值语句更新128个语句后，性能不会有太大提升，
    // 因此我们限制最大批量大小并在此处分割。
    ...
    final int highestBlockCount = 128;
    final int maxValueBlocks = bindCount == 0 ? 1024 /* 如果没有绑定，则使用1024行 */
        : Integer.highestOneBit( // deriveForMultiBatch仅支持2的幂次
            Math.min(Math.max(1, (Short.MAX_VALUE - 1) / bindCount), highestBlockCount));
}

一个批插入的单个查询最多只能包含128个元素
其他行数将为2的幂次

我现在在数据库中使用128作为序列增量，并在客户端端使用作为批量大小参数，效果非常好。

英文:

Found while debugging the PgPreparedStatement class and its transformQueriesAndParameters() method :

 @Override
  protected void transformQueriesAndParameters() throws SQLException {
    ...
    BatchedQuery originalQuery = (BatchedQuery) preparedQuery.query;
    // Single query cannot have more than {@link Short#MAX_VALUE} binds, thus
    // the number of multi-values blocks should be capped.
    // Typically, it does not make much sense to batch more than 128 rows: performance
    // does not improve much after updating 128 statements with 1 multi-valued one, thus
    // we cap maximum batch size and split there.
    ...
    final int highestBlockCount = 128;
    final int maxValueBlocks = bindCount == 0 ? 1024 /* if no binds, use 1024 rows */
        : Integer.highestOneBit( // deriveForMultiBatch supports powers of two only
            Math.min(Math.max(1, (Short.MAX_VALUE - 1) / bindCount), highestBlockCount));
}

a single query for a batch insert can only contain at max 128 elements
other number of rows will be power of two

I'm now using 128 as sequence increment in the database and as batch-size parameter on the client side, it works like a charm.

答案2

得分: 0

我没有最终的答案，但这种行为似乎与批量获取的原因非常相似，可能是出于相同的原因。

它使用不同的语句，参数集数量为2的幂次。这是为了最小化执行不同语句的次数。数据库需要解析语句并使用缓存来保存已解析的语句。如果客户端执行了大量本质相同但参数集数量不同的语句，这将使缓存变得无用。

另一方面，我只在批量获取操作中看到过，而没有在批量插入操作中看到过。我有一些猜测可能是发生了以下情况：

您的 ID 是由数据库生成的，因此在数据可以写入数据库之前，需要从数据库序列中查询 ID。也许选择行为会泄漏到插入操作中。
这可能是 JDBC 驱动程序进行的优化，正在重写这种类型的语句。
Hibernate 经常这样做，我可能只是忽略了。尽管我认为当参数集数量等于批量大小时这样做有点奇怪。

英文:

I don't have a conclusive answer, but this behaviour seems very similar and probably for the same reason as that of batch fetching.

It uses different statements with the number of parameter sets equal to powers of two.
This is to minimize the number of different statements executed.
Databases need to parse statements and use caches to hold parsed statements.
If a client executes tons of statements that are doing essentially the same thing but differ in number of parameter sets this would render the cache useless.

On the other hand I haven't seen it with batch inserts but only with bulk fetch operations.
I have a couple of guesses why this might be happening:

Your ids get generated by the database, so before the data can be written to the database ids need to get queried from the database sequence. Maybe the select behaviour than leaks through to the inserts
It could be a an optimisation done by the JDBC driver that is rewriting this kind auf statements.
Hibernate does that all the time and I just missed that. Although I think it is weird to do it when the number of parameter sets is equal to the batch size.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么Hibernate将我的批量插入拆分成了3个查询。

问题

答案1

答案2

如何在使用Java的流(Streams)进行比较映射(Maps)时处理空值？

how to create a Mock database and add data in mock db for integration testing using postgres db and java

查询Firestore中子集合中的最大值

Should i use gRPC in android (client) and node js (server) to make a video call?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论