英文:
Springbatch read from csv, how does it work?
问题
I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.
My questions is:
Does springbatch loads all 10k rows from csv in one time, process individually(10k times), and then store all of them into the destination file in one go? If so, what's the point of using springbatch? I can have three methods doing the same job right?
Or:
Does springbatch opens up a stream reading 10k rows from csv, each time it reads 10 rows, process 10 rows, and open a output stream write/append those 10 rows into destination file? Basically repeats 10k/10 = 1k times.
@Configuration
public class SampleJob3 {
@Bean
public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new JobBuilder("Job3", jobRepository)
.incrementer(new RunIdIncrementer()) // work with program args
.start(step(jobRepository, transactionManager))
.build();
}
private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("Job3 Step started ")
.<Student, Student>chunk(3)
.repository(jobRepository)
.transactionManager(transactionManager)
.reader(reader(true))
.processor(student -> {
System.out.println("processor");
return new Student(student.getId(), student.getFirstName() + "!", student.getLastName() + "!", student.getEmail() + "!");
})
.writer(writer())
.build();
}
private FlatFileItemReader<Student> reader(boolean isValid) {
System.println("reader");
FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
// using FileSystemResource if file stores in a directory instead of resource folder
reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? "input/students.csv" : "input/students_invalid.csv"));
reader.setLineMapper(new DefaultLineMapper<Student>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames("ID", "First Name", "Last Name", "Email");
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Student>() {
{
setTargetType(Student.class);
}
});
}
});
reader.setLinesToSkip(1);
return reader;
}
//@Bean
public FlatFileItemWriter<Student> writer() {
System.out.println("writer");
FlatFileItemWriter<Student> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("output/students.csv"));
writer.setHeaderCallback(writer1 -> writer1.write("Id,First Name,Last Name, Email"));
writer.setLineAggregator(new DelimitedLineAggregator<Student>() {
{
setFieldExtractor(new BeanWrapperFieldExtractor<Student>() {
{
setNames(new String[]{"id", "firstName", "lastName", "email"});
}
});
}
});
writer.setFooterCallback(writer12 -> writer12.write("Created @ " + Instant.now()));
return writer;
}
}
My last question basically the same, but datasource is database. e.g. reading a table contains 10k data from dbA and write to dbB. Am I able to read 10 rows at a time, process them and write them to dbB? If so, can you share some pseudocode?
【翻译结束】
英文:
I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.
My questions is:
Does springbatch loads all 10k rows from csv in one time, process individually(10k times), and then store all of them into the destination file in one go? If so, what's the point of using springbatch? I can have three methods doing the same job right?
Or:
Does springbatch opens up a stream reading 10k rows from csv, each time it reads 10 rows, process 10 rows, and open a output stream write/append those 10 rows into destination file? Basically repeats 10k/10 = 1k times.
@Configuration
public class SampleJob3 {
@Bean
public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new JobBuilder("Job3", jobRepository)
.incrementer(new RunIdIncrementer()) // work with program args
.start(step(jobRepository, transactionManager))
.build();
}
private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("Job3 Step started ")
.<Student, Student>chunk(3)
.repository(jobRepository)
.transactionManager(transactionManager)
.reader(reader(true))
.processor(student -> {
System.out.println("processor");
return new Student(student.getId(), student.getFirstName() + "!", student.getLastName() + "!", student.getEmail() + "!");
})
.writer(writer())
.build();
}
private FlatFileItemReader<Student> reader(boolean isValid) {
System.out.println("reader");
FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
// using FileSystemResource if file stores in a directory instead of resource folder
reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? "input/students.csv" : "input/students_invalid.csv"));
reader.setLineMapper(new DefaultLineMapper<>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames("ID", "First Name", "Last Name", "Email");
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
setTargetType(Student.class);
}});
}
});
reader.setLinesToSkip(1);
return reader;
}
//@Bean
public FlatFileItemWriter<Student> writer() {
System.out.println("writer");
FlatFileItemWriter<Student> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("output/students.csv"));
writer.setHeaderCallback(writer1 -> writer1.write("Id,First Name,Last Name, Email"));
writer.setLineAggregator(new DelimitedLineAggregator<>() {{
setFieldExtractor(new BeanWrapperFieldExtractor<>() {{
setNames(new String[]{"id", "firstName", "lastName", "email"});
}});
}});
writer.setFooterCallback(writer12 -> writer12.write("Created @ " + Instant.now()));
return writer;
}
}
My last question basically the same, but datasource is database. e.g. reading a table contains 10k data from dbA and write to dbB. Am I able to read 10 rows at a time, process them and write them to dbB? If so, can you share some sudocode?
答案1
得分: 0
Spring Batch中的分块处理步骤不会一次性读取整个文件或表。它将从源头以分块的方式(可配置大小)流式传输数据。
您可以在参考文档中找到有关处理模型的更多详细信息:分块处理。
英文:
A chunk-oriented step in Spring Batch will not read the entire file or table at once. It will rather stream data from the source in chunks (of a configurable size).
You can find more details about the processing model in the reference documentation here: Chunk-oriented Processing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论