2023年2月23日 22:52:26go评论126阅读模式

英文:

A Spring JDBC query hangs accessing a large table. Will the PagingAndSortingRepository in Spring Data JPA solve the problem?

问题

以下是您要翻译的部分：

A select * on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:

import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);
RowMapper&lt;Entity&gt; entityRowMapper = (ResultSet rs, int rowNum) -&gt; {
    Entity entity = new Entity();
    entity.setDatabaseId(rs.getInt(&quot;id&quot;));
    entity.setEntityId(rs.getString(&quot;entity_id&quot;));
    entity.setFound(rs.getBoolean(&quot;found&quot;));
    // 设置其他 10 个字段...
    return entity;
};
jdbcTemplate.query(&quot;select * from table;&quot;, entityRowMapper);

Thus we're using NamedParameterJdbcTemplate's query(String sql, RowMapper<T> rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc JDBC library.

The query goes to an SQL Server database. Sometimes everything works, especially when the table is small. But when the table is large (14 million row), jdbcTemplate.query often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see:

SP:StmtCompleted
RPC:Completed

EventClasses for the it in the trace. No exception is thrown at the client.

I'm thinking of trying to fix this problem by replacing the code which fails intermittently with a findAll call that uses Pageable in the PagingAndSortingRepository interface. Would you expect that to solve the problem?

Arthur

英文:

A select * on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:

import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql.ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);
RowMapper&lt;Entity&gt; entityRowMapper = (ResultSet rs, int rowNum) -&gt; {
    Entity entity = new Entity();
    entity.setDatabaseId(rs.getInt(&quot;id&quot;));
    entity.setEntityId(rs.getString(&quot;entity_id&quot;));
    entity.setFound(rs.getBoolean(&quot;found&quot;));
		// Set 10 more fields ...
    return entity;
};
jdbcTemplate.query(&quot;select * from table;&quot;, entityRowMapper);

Thus we're using NamedParameterJdbcTemplate's query(String sql, RowMapper<T> rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc JDBC library.

The query goes to an SLQ Server database. Sometimes everything works, especially when the table is small.
But when the table is large (14 million row), jdbcTemplate.query often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see

SP:StmtCompleted
RPC:Completed

EventClasses for the it in the trace. No exception is thrown at the client.

Arthur

答案1

得分: 1

分页可能会解决问题，但你不必使用JPA，也不会有帮助。

正如评论中所讨论的，你的问题很可能是将 1400 万个实例存储在内存中需要大量的内存。
当然，构造这些实例需要很多时间。

因此，分页可以通过将过程分解成块来帮助。
但你可以通过将偏移量和限制直接纳入查询来轻松实现当前方法。请参见 https://stackoverflow.com/a/9261762/66686

但是在大型表上，分页可能会变得很慢，更高效的方法可能是使用基于键的分页。你可以选择每批 n 行，并按主键排序，然后在下一批中选择上一批的最后 id 值之后的第一批 n 行。

一个可能更好的替代方法是使用RowCallbackHandler而不是行映射器，并在处理下一行之前完全处理一行。这会减少内存中保留的数据，同时只运行一个查询。
当然，这取决于你要如何处理所有数据，这是否是可行的方法。

如果你选择分页，可能要考虑使用Spring Batch，它专为此类处理而设计。

JPA 可能只会增加每行的内存成本和性能。
它通常不适用于如此大量的数据。

英文:

Doing pagination might solve the problem, but you don't have to use JPA for this, nor would it help.

As discussed in the comments your problem is most likely that storing 14 million instances in memory needs a lot of memory.
And of course, constructing those takes a lot of time.

Therefore pagination would help by breaking the process down into chunks.
But you can do that easily with the current approach by simply incorporating offset and limit into the query. See https://stackoverflow.com/a/9261762/66686

But pagination can become slow on large tables and it might be more efficient to use key based pagination. Where you select n rows per batch ordered by primary key and on the next batch select the first n rows after the last id value of the previous batch.

An alternative that might work even better is to use a RowCallbackHandler instead of a row mapper and fully process a row, before handling the next one. This minimises the data kept in memory, while at the same time runs only a single query.
Of course, it depends on what you try to do with all the data if this is a feasible approach.

If you go for pagination you might want to look into Spring Batch which is intended for this kind of processing.

JPA would likely just increase the cost in memory and performance per row.
It is generally not well suited for such large amounts of data.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

A Spring JDBC query hangs accessing a large table. Will the PagingAndSortingRepository in Spring Data JPA solve the problem?

问题

答案1

在Spring Boot应用中使用R2DBC和Flyway设置h2

解锁是如何在等待之后发生的

德信斯·马夫恩集成插件故障

如何在Java 8中使用Observable的zip功能

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。