Spring Batch 从 CSV 文件中读取数据,它是如何工作的?

huangapple go评论61阅读模式
英文:

Springbatch read from csv, how does it work?

问题

I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.

My questions is:

Does springbatch loads all 10k rows from csv in one time, process individually(10k times), and then store all of them into the destination file in one go? If so, what's the point of using springbatch? I can have three methods doing the same job right?

Or:

Does springbatch opens up a stream reading 10k rows from csv, each time it reads 10 rows, process 10 rows, and open a output stream write/append those 10 rows into destination file? Basically repeats 10k/10 = 1k times.


@Configuration
public class SampleJob3 {

    @Bean
    public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new JobBuilder("Job3", jobRepository)
                .incrementer(new RunIdIncrementer()) // work with program args
                .start(step(jobRepository, transactionManager))
                .build();
    }

    private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new StepBuilder("Job3 Step started ")
                .<Student, Student>chunk(3)
                .repository(jobRepository)
                .transactionManager(transactionManager)
                .reader(reader(true))
                .processor(student -> {
                    System.out.println("processor");
                    return new Student(student.getId(), student.getFirstName() + "!", student.getLastName() + "!", student.getEmail() + "!");
                })
                .writer(writer())
                .build();
    }


    private FlatFileItemReader<Student> reader(boolean isValid) {
        System.println("reader");
        FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
        // using FileSystemResource if file stores in a directory instead of resource folder
        reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? "input/students.csv" : "input/students_invalid.csv"));
        reader.setLineMapper(new DefaultLineMapper<Student>() {
            {
                setLineTokenizer(new DelimitedLineTokenizer() {{
                    setNames("ID", "First Name", "Last Name", "Email");
                }});

                setFieldSetMapper(new BeanWrapperFieldSetMapper<Student>() {
                    {
                        setTargetType(Student.class);
                    }
                });
            }
        });
        reader.setLinesToSkip(1);
        return reader;
    }

    //@Bean
    public FlatFileItemWriter<Student> writer() {
        System.out.println("writer");
        FlatFileItemWriter<Student> writer = new FlatFileItemWriter<>();
        writer.setResource(new FileSystemResource("output/students.csv"));
        writer.setHeaderCallback(writer1 -> writer1.write("Id,First Name,Last Name, Email"));
        writer.setLineAggregator(new DelimitedLineAggregator<Student>() {
            {
                setFieldExtractor(new BeanWrapperFieldExtractor<Student>() {
                    {
                        setNames(new String[]{"id", "firstName", "lastName", "email"});
                    }
                });
            }
        });

        writer.setFooterCallback(writer12 -> writer12.write("Created @ " + Instant.now()));
        return writer;
    }
}

My last question basically the same, but datasource is database. e.g. reading a table contains 10k data from dbA and write to dbB. Am I able to read 10 rows at a time, process them and write them to dbB? If so, can you share some pseudocode?

【翻译结束】

英文:

I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.

My questions is:

Does springbatch loads all 10k rows from csv in one time, process individually(10k times), and then store all of them into the destination file in one go? If so, what's the point of using springbatch? I can have three methods doing the same job right?

Or:

Does springbatch opens up a stream reading 10k rows from csv, each time it reads 10 rows, process 10 rows, and open a output stream write/append those 10 rows into destination file? Basically repeats 10k/10 = 1k times.


@Configuration
public class SampleJob3 {

    @Bean
    public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new JobBuilder(&quot;Job3&quot;, jobRepository)
                .incrementer(new RunIdIncrementer()) // work with program args
                .start(step(jobRepository, transactionManager))
                .build();
    }

    private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new StepBuilder(&quot;Job3 Step started &quot;)
                .&lt;Student, Student&gt;chunk(3)
                .repository(jobRepository)
                .transactionManager(transactionManager)
                .reader(reader(true))
                .processor(student -&gt; {
                    System.out.println(&quot;processor&quot;);
                    return new Student(student.getId(), student.getFirstName() + &quot;!&quot;, student.getLastName() + &quot;!&quot;, student.getEmail() + &quot;!&quot;);
                })
                .writer(writer())
                .build();
    }


    private FlatFileItemReader&lt;Student&gt; reader(boolean isValid) {
        System.out.println(&quot;reader&quot;);
        FlatFileItemReader&lt;Student&gt; reader = new FlatFileItemReader&lt;&gt;();
        // using FileSystemResource if file stores in a directory instead of resource folder
        reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? &quot;input/students.csv&quot; : &quot;input/students_invalid.csv&quot;));
        reader.setLineMapper(new DefaultLineMapper&lt;&gt;() {
            {
                setLineTokenizer(new DelimitedLineTokenizer() {{
                    setNames(&quot;ID&quot;, &quot;First Name&quot;, &quot;Last Name&quot;, &quot;Email&quot;);
                }});

                setFieldSetMapper(new BeanWrapperFieldSetMapper&lt;&gt;() {{
                    setTargetType(Student.class);
                }});
            }
        });
        reader.setLinesToSkip(1);
        return reader;
    }

    //@Bean
    public FlatFileItemWriter&lt;Student&gt; writer() {
        System.out.println(&quot;writer&quot;);
        FlatFileItemWriter&lt;Student&gt; writer = new FlatFileItemWriter&lt;&gt;();
        writer.setResource(new FileSystemResource(&quot;output/students.csv&quot;));
        writer.setHeaderCallback(writer1 -&gt; writer1.write(&quot;Id,First Name,Last Name, Email&quot;));
        writer.setLineAggregator(new DelimitedLineAggregator&lt;&gt;() {{
            setFieldExtractor(new BeanWrapperFieldExtractor&lt;&gt;() {{
                setNames(new String[]{&quot;id&quot;, &quot;firstName&quot;, &quot;lastName&quot;, &quot;email&quot;});
            }});
        }});

        writer.setFooterCallback(writer12 -&gt; writer12.write(&quot;Created @ &quot; + Instant.now()));
        return writer;
    }


}

My last question basically the same, but datasource is database. e.g. reading a table contains 10k data from dbA and write to dbB. Am I able to read 10 rows at a time, process them and write them to dbB? If so, can you share some sudocode?

答案1

得分: 0

Spring Batch中的分块处理步骤不会一次性读取整个文件或表。它将从源头以分块的方式(可配置大小)流式传输数据。

您可以在参考文档中找到有关处理模型的更多详细信息:分块处理

英文:

A chunk-oriented step in Spring Batch will not read the entire file or table at once. It will rather stream data from the source in chunks (of a configurable size).

You can find more details about the processing model in the reference documentation here: Chunk-oriented Processing.

huangapple
  • 本文由 发表于 2023年2月7日 00:55:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364317.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定