可以使用Spring Boot获取大量数据吗,还是应该尝试其他方法?

huangapple go评论67阅读模式
英文:

Is it possible to get a TON of data with springboot or should we try something else?

问题

I am running into a problem where I run out of memory trying to use a select query on a table with 232 million records. The specific field is a String with 17 characters. I am basically tasked with making a connection to a specific table get the selected data then use that data to call an API. Then ingest that to another table. I am, of course, running out of memory. Does anyone have any suggestions. I am keeping it general since I cannot share the codebase, but I am wondering if this design should work with such a large dataset.

以下是我的运行器。

 @Override
    public void run(ApplicationArguments args) {
        final long startTimeAt = System.currentTimeMillis();

        // List with all the data from AT database
        List<Data> data = DataRepository.getData(); //ERROR
        final long endTimeAt = System.currentTimeMillis();

        log.info("AT data retrieval took time: {}", endTimeAt - startTimeAt);

This is how I am getting data

@Override
    public List<Data> getData() {
        jdbcTemplate.setFetchSize(10000);
        return jdbcTemplate.query(getVinDataQuery, new BeanPropertyRowMapper<>(Data.class));
    }

If it helps here is the error I am getting:
java.lang.OutOfMemoryError: Java heap space

英文:

I am running into a problem where I run out of memory trying to use a select query on a table with 232 million records. The specific field is a String with 17 characters. I am basically tasked with making a connection to a specific table get the selected data then use that data to call an API. Then ingest that to another table. I am, of course, running out of memory. Does anyone have any suggestions. I am keeping it general since I cannot share the codebase, but I am wondering if this design should work with such a large dataset.

below is my runner.

 @Override
    public void run(ApplicationArguments args) {
        final long startTimeAt = System.currentTimeMillis();

        // List with all the data from AT database
        List&lt;Data&gt; data = DataRepository.getData(); //ERROR
        final long endTimeAt = System.currentTimeMillis();

        log.info(&quot;AT data retrieval took time: {}&quot;, endTimeAt - startTimeAt);

This is how I am getting data

@Override
    public List&lt;Data&gt; getData() {
        jdbcTemplate.setFetchSize(10000);
        return jdbcTemplate.query(getVinDataQuery, new BeanPropertyRowMapper&lt;&gt;(Data.class));
    }

If it helps here is the error I am getting:
java.lang.OutOfMemoryError: Java heap space

答案1

得分: 2

你不需要将完整的数据集加载到内存中,而是应该在运行时处理它:

class JdbcTemplate {
    public void query(String sql, RowCallbackHandler rch) throws DataAccessException;
}

例如:

jdbcTemplate.query(getVinDataQuery, rs -> {
    // 处理每一行...
    final String id = rs.getString("id");
    final String text = rs.getString("text");

    doProcessRow(new Data(id, text));
});
英文:

You don't need to load full dataset into memory, but instead you should process it on-the-fly:

class JdbcTemplate {
	public void query(String sql, RowCallbackHandler rch) throws DataAccessException;
}

E.g.:

jdbcTemplate.query(getVinDataQuery, rs -&gt; {
    // process each row...
    final String id = rs.getString(&quot;id&quot;);
    final String text = rs.getString(&quot;text&quot;);

    doProcessRow(new Data(id, text));
});

huangapple
  • 本文由 发表于 2023年4月17日 22:40:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定