A Spring JDBC query hangs accessing a large table. Will the PagingAndSortingRepository in Spring Data JPA solve the problem?

huangapple go评论77阅读模式
英文:

A Spring JDBC query hangs accessing a large table. Will the PagingAndSortingRepository in Spring Data JPA solve the problem?

问题

以下是您要翻译的部分:

A select * on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:

import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;

// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);

RowMapper<Entity> entityRowMapper = (ResultSet rs, int rowNum) -> {
    Entity entity = new Entity();
    entity.setDatabaseId(rs.getInt("id"));
    entity.setEntityId(rs.getString("entity_id"));
    entity.setFound(rs.getBoolean("found"));
    // 设置其他 10 个字段...
    return entity;
};
jdbcTemplate.query("select * from table;", entityRowMapper);

Thus we're using NamedParameterJdbcTemplate's query(String sql, RowMapper<T> rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc JDBC library.

The query goes to an SQL Server database. Sometimes everything works, especially when the table is small. But when the table is large (14 million row), jdbcTemplate.query often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see:

SP:StmtCompleted
RPC:Completed

EventClasses for the it in the trace. No exception is thrown at the client.

I'm thinking of trying to fix this problem by replacing the code which fails intermittently with a findAll call that uses Pageable in the PagingAndSortingRepository interface. Would you expect that to solve the problem?

Arthur

英文:

A select * on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:

import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql.ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;

// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);

RowMapper&lt;Entity&gt; entityRowMapper = (ResultSet rs, int rowNum) -&gt; {
    Entity entity = new Entity();
    entity.setDatabaseId(rs.getInt(&quot;id&quot;));
    entity.setEntityId(rs.getString(&quot;entity_id&quot;));
    entity.setFound(rs.getBoolean(&quot;found&quot;));
		// Set 10 more fields ...
    return entity;
};
jdbcTemplate.query(&quot;select * from table;&quot;, entityRowMapper);

Thus we're using NamedParameterJdbcTemplate's query(String sql, RowMapper<T> rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc JDBC library.

The query goes to an SLQ Server database. Sometimes everything works, especially when the table is small.
But when the table is large (14 million row), jdbcTemplate.query often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see

SP:StmtCompleted
RPC:Completed

EventClasses for the it in the trace. No exception is thrown at the client.

I'm thinking of trying to fix this problem by replacing the code which fails intermittently with a findAll call that uses Pageable in the PagingAndSortingRepository interface. Would you expect that to solve the problem?

Arthur

答案1

得分: 1

分页可能会解决问题,但你不必使用JPA,也不会有帮助。

正如评论中所讨论的,你的问题很可能是将 1400 万个实例存储在内存中需要大量的内存。
当然,构造这些实例需要很多时间。

因此,分页可以通过将过程分解成块来帮助。
但你可以通过将偏移量和限制直接纳入查询来轻松实现当前方法。请参见 https://stackoverflow.com/a/9261762/66686

但是在大型表上,分页可能会变得很慢,更高效的方法可能是使用基于键的分页。你可以选择每批 n 行,并按主键排序,然后在下一批中选择上一批的最后 id 值之后的第一批 n 行。

一个可能更好的替代方法是使用RowCallbackHandler而不是行映射器,并在处理下一行之前完全处理一行。这会减少内存中保留的数据,同时只运行一个查询。
当然,这取决于你要如何处理所有数据,这是否是可行的方法。

如果你选择分页,可能要考虑使用Spring Batch,它专为此类处理而设计。

JPA 可能只会增加每行的内存成本和性能。
它通常不适用于如此大量的数据。

英文:

Doing pagination might solve the problem, but you don't have to use JPA for this, nor would it help.

As discussed in the comments your problem is most likely that storing 14 million instances in memory needs a lot of memory.
And of course, constructing those takes a lot of time.

Therefore pagination would help by breaking the process down into chunks.
But you can do that easily with the current approach by simply incorporating offset and limit into the query. See https://stackoverflow.com/a/9261762/66686

But pagination can become slow on large tables and it might be more efficient to use key based pagination. Where you select n rows per batch ordered by primary key and on the next batch select the first n rows after the last id value of the previous batch.

An alternative that might work even better is to use a RowCallbackHandler instead of a row mapper and fully process a row, before handling the next one. This minimises the data kept in memory, while at the same time runs only a single query.
Of course, it depends on what you try to do with all the data if this is a feasible approach.

If you go for pagination you might want to look into Spring Batch which is intended for this kind of processing.

JPA would likely just increase the cost in memory and performance per row.
It is generally not well suited for such large amounts of data.

huangapple
  • 本文由 发表于 2023年2月23日 22:52:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546463.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定