可以使用Spring Boot获取大量数据吗,还是应该尝试其他方法?

huangapple go评论106阅读模式
英文:

Is it possible to get a TON of data with springboot or should we try something else?

问题

I am running into a problem where I run out of memory trying to use a select query on a table with 232 million records. The specific field is a String with 17 characters. I am basically tasked with making a connection to a specific table get the selected data then use that data to call an API. Then ingest that to another table. I am, of course, running out of memory. Does anyone have any suggestions. I am keeping it general since I cannot share the codebase, but I am wondering if this design should work with such a large dataset.

以下是我的运行器。

  1. @Override
  2. public void run(ApplicationArguments args) {
  3. final long startTimeAt = System.currentTimeMillis();
  4. // List with all the data from AT database
  5. List<Data> data = DataRepository.getData(); //ERROR
  6. final long endTimeAt = System.currentTimeMillis();
  7. log.info("AT data retrieval took time: {}", endTimeAt - startTimeAt);

This is how I am getting data

  1. @Override
  2. public List<Data> getData() {
  3. jdbcTemplate.setFetchSize(10000);
  4. return jdbcTemplate.query(getVinDataQuery, new BeanPropertyRowMapper<>(Data.class));
  5. }

If it helps here is the error I am getting:
java.lang.OutOfMemoryError: Java heap space

英文:

I am running into a problem where I run out of memory trying to use a select query on a table with 232 million records. The specific field is a String with 17 characters. I am basically tasked with making a connection to a specific table get the selected data then use that data to call an API. Then ingest that to another table. I am, of course, running out of memory. Does anyone have any suggestions. I am keeping it general since I cannot share the codebase, but I am wondering if this design should work with such a large dataset.

below is my runner.

  1. @Override
  2. public void run(ApplicationArguments args) {
  3. final long startTimeAt = System.currentTimeMillis();
  4. // List with all the data from AT database
  5. List&lt;Data&gt; data = DataRepository.getData(); //ERROR
  6. final long endTimeAt = System.currentTimeMillis();
  7. log.info(&quot;AT data retrieval took time: {}&quot;, endTimeAt - startTimeAt);

This is how I am getting data

  1. @Override
  2. public List&lt;Data&gt; getData() {
  3. jdbcTemplate.setFetchSize(10000);
  4. return jdbcTemplate.query(getVinDataQuery, new BeanPropertyRowMapper&lt;&gt;(Data.class));
  5. }

If it helps here is the error I am getting:
java.lang.OutOfMemoryError: Java heap space

答案1

得分: 2

你不需要将完整的数据集加载到内存中,而是应该在运行时处理它:

  1. class JdbcTemplate {
  2. public void query(String sql, RowCallbackHandler rch) throws DataAccessException;
  3. }

例如:

  1. jdbcTemplate.query(getVinDataQuery, rs -> {
  2. // 处理每一行...
  3. final String id = rs.getString("id");
  4. final String text = rs.getString("text");
  5. doProcessRow(new Data(id, text));
  6. });
英文:

You don't need to load full dataset into memory, but instead you should process it on-the-fly:

  1. class JdbcTemplate {
  2. public void query(String sql, RowCallbackHandler rch) throws DataAccessException;
  3. }

E.g.:

  1. jdbcTemplate.query(getVinDataQuery, rs -&gt; {
  2. // process each row...
  3. final String id = rs.getString(&quot;id&quot;);
  4. final String text = rs.getString(&quot;text&quot;);
  5. doProcessRow(new Data(id, text));
  6. });

huangapple
  • 本文由 发表于 2023年4月17日 22:40:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定