英文:
Is it possible to get a TON of data with springboot or should we try something else?
问题
I am running into a problem where I run out of memory trying to use a select query on a table with 232 million records. The specific field is a String with 17 characters. I am basically tasked with making a connection to a specific table get the selected data then use that data to call an API. Then ingest that to another table. I am, of course, running out of memory. Does anyone have any suggestions. I am keeping it general since I cannot share the codebase, but I am wondering if this design should work with such a large dataset.
以下是我的运行器。
@Override
public void run(ApplicationArguments args) {
final long startTimeAt = System.currentTimeMillis();
// List with all the data from AT database
List<Data> data = DataRepository.getData(); //ERROR
final long endTimeAt = System.currentTimeMillis();
log.info("AT data retrieval took time: {}", endTimeAt - startTimeAt);
This is how I am getting data
@Override
public List<Data> getData() {
jdbcTemplate.setFetchSize(10000);
return jdbcTemplate.query(getVinDataQuery, new BeanPropertyRowMapper<>(Data.class));
}
If it helps here is the error I am getting:
java.lang.OutOfMemoryError: Java heap space
英文:
I am running into a problem where I run out of memory trying to use a select query on a table with 232 million records. The specific field is a String with 17 characters. I am basically tasked with making a connection to a specific table get the selected data then use that data to call an API. Then ingest that to another table. I am, of course, running out of memory. Does anyone have any suggestions. I am keeping it general since I cannot share the codebase, but I am wondering if this design should work with such a large dataset.
below is my runner.
@Override
public void run(ApplicationArguments args) {
final long startTimeAt = System.currentTimeMillis();
// List with all the data from AT database
List<Data> data = DataRepository.getData(); //ERROR
final long endTimeAt = System.currentTimeMillis();
log.info("AT data retrieval took time: {}", endTimeAt - startTimeAt);
This is how I am getting data
@Override
public List<Data> getData() {
jdbcTemplate.setFetchSize(10000);
return jdbcTemplate.query(getVinDataQuery, new BeanPropertyRowMapper<>(Data.class));
}
If it helps here is the error I am getting:
java.lang.OutOfMemoryError: Java heap space
答案1
得分: 2
你不需要将完整的数据集加载到内存中,而是应该在运行时处理它:
class JdbcTemplate {
public void query(String sql, RowCallbackHandler rch) throws DataAccessException;
}
例如:
jdbcTemplate.query(getVinDataQuery, rs -> {
// 处理每一行...
final String id = rs.getString("id");
final String text = rs.getString("text");
doProcessRow(new Data(id, text));
});
英文:
You don't need to load full dataset into memory, but instead you should process it on-the-fly:
class JdbcTemplate {
public void query(String sql, RowCallbackHandler rch) throws DataAccessException;
}
E.g.:
jdbcTemplate.query(getVinDataQuery, rs -> {
// process each row...
final String id = rs.getString("id");
final String text = rs.getString("text");
doProcessRow(new Data(id, text));
});
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论