英文:
A Spring JDBC query hangs accessing a large table. Will the PagingAndSortingRepository in Spring Data JPA solve the problem?
问题
以下是您要翻译的部分:
A select *
on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:
import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);
RowMapper<Entity> entityRowMapper = (ResultSet rs, int rowNum) -> {
Entity entity = new Entity();
entity.setDatabaseId(rs.getInt("id"));
entity.setEntityId(rs.getString("entity_id"));
entity.setFound(rs.getBoolean("found"));
// 设置其他 10 个字段...
return entity;
};
jdbcTemplate.query("select * from table;", entityRowMapper);
Thus we're using NamedParameterJdbcTemplate
's query(String sql, RowMapper<T> rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc
JDBC library.
The query goes to an SQL Server database. Sometimes everything works, especially when the table is small. But when the table is large (14 million row), jdbcTemplate.query
often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see:
SP:StmtCompleted
RPC:Completed
EventClass
es for the it in the trace. No exception is thrown at the client.
I'm thinking of trying to fix this problem by replacing the code which fails intermittently with a findAll
call that uses Pageable in the PagingAndSortingRepository
interface. Would you expect that to solve the problem?
Arthur
英文:
A select *
on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:
import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql.ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);
RowMapper<Entity> entityRowMapper = (ResultSet rs, int rowNum) -> {
Entity entity = new Entity();
entity.setDatabaseId(rs.getInt("id"));
entity.setEntityId(rs.getString("entity_id"));
entity.setFound(rs.getBoolean("found"));
// Set 10 more fields ...
return entity;
};
jdbcTemplate.query("select * from table;", entityRowMapper);
Thus we're using NamedParameterJdbcTemplate
's query(String sql, RowMapper<T> rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc
JDBC library.
The query goes to an SLQ Server database. Sometimes everything works, especially when the table is small.
But when the table is large (14 million row), jdbcTemplate.query
often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see
SP:StmtCompleted
RPC:Completed
EventClass
es for the it in the trace. No exception is thrown at the client.
I'm thinking of trying to fix this problem by replacing the code which fails intermittently with a findAll
call that uses Pageable in the PagingAndSortingRepository
interface. Would you expect that to solve the problem?
Arthur
答案1
得分: 1
分页可能会解决问题,但你不必使用JPA,也不会有帮助。
正如评论中所讨论的,你的问题很可能是将 1400 万个实例存储在内存中需要大量的内存。
当然,构造这些实例需要很多时间。
因此,分页可以通过将过程分解成块来帮助。
但你可以通过将偏移量和限制直接纳入查询来轻松实现当前方法。请参见 https://stackoverflow.com/a/9261762/66686
但是在大型表上,分页可能会变得很慢,更高效的方法可能是使用基于键的分页。你可以选择每批 n 行,并按主键排序,然后在下一批中选择上一批的最后 id 值之后的第一批 n 行。
一个可能更好的替代方法是使用RowCallbackHandler
而不是行映射器,并在处理下一行之前完全处理一行。这会减少内存中保留的数据,同时只运行一个查询。
当然,这取决于你要如何处理所有数据,这是否是可行的方法。
如果你选择分页,可能要考虑使用Spring Batch,它专为此类处理而设计。
JPA 可能只会增加每行的内存成本和性能。
它通常不适用于如此大量的数据。
英文:
Doing pagination might solve the problem, but you don't have to use JPA for this, nor would it help.
As discussed in the comments your problem is most likely that storing 14 million instances in memory needs a lot of memory.
And of course, constructing those takes a lot of time.
Therefore pagination would help by breaking the process down into chunks.
But you can do that easily with the current approach by simply incorporating offset and limit into the query. See https://stackoverflow.com/a/9261762/66686
But pagination can become slow on large tables and it might be more efficient to use key based pagination. Where you select n rows per batch ordered by primary key and on the next batch select the first n rows after the last id value of the previous batch.
An alternative that might work even better is to use a RowCallbackHandler
instead of a row mapper and fully process a row, before handling the next one. This minimises the data kept in memory, while at the same time runs only a single query.
Of course, it depends on what you try to do with all the data if this is a feasible approach.
If you go for pagination you might want to look into Spring Batch which is intended for this kind of processing.
JPA would likely just increase the cost in memory and performance per row.
It is generally not well suited for such large amounts of data.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论