问题

I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.

My questions is:

Does springbatch loads all 10k rows from csv in one time, process individually(10k times), and then store all of them into the destination file in one go? If so, what's the point of using springbatch? I can have three methods doing the same job right?

Or:

Does springbatch opens up a stream reading 10k rows from csv, each time it reads 10 rows, process 10 rows, and open a output stream write/append those 10 rows into destination file? Basically repeats 10k/10 = 1k times.


@Configuration
public class SampleJob3 {

    @Bean
    public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new JobBuilder("Job3", jobRepository)
                .incrementer(new RunIdIncrementer()) // work with program args
                .start(step(jobRepository, transactionManager))
                .build();
    }

    private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new StepBuilder("Job3 Step started ")
                .<Student, Student>chunk(3)
                .repository(jobRepository)
                .transactionManager(transactionManager)
                .reader(reader(true))
                .processor(student -> {
                    System.out.println("processor");
                    return new Student(student.getId(), student.getFirstName() + "!", student.getLastName() + "!", student.getEmail() + "!");
                })
                .writer(writer())
                .build();
    }


    private FlatFileItemReader<Student> reader(boolean isValid) {
        System.println("reader");
        FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
        // using FileSystemResource if file stores in a directory instead of resource folder
        reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? "input/students.csv" : "input/students_invalid.csv"));
        reader.setLineMapper(new DefaultLineMapper<Student>() {
            {
                setLineTokenizer(new DelimitedLineTokenizer() {{
                    setNames("ID", "First Name", "Last Name", "Email");
                }});

                setFieldSetMapper(new BeanWrapperFieldSetMapper<Student>() {
                    {
                        setTargetType(Student.class);
                    }
                });
            }
        });
        reader.setLinesToSkip(1);
        return reader;
    }

    //@Bean
    public FlatFileItemWriter<Student> writer() {
        System.out.println("writer");
        FlatFileItemWriter<Student> writer = new FlatFileItemWriter<>();
        writer.setResource(new FileSystemResource("output/students.csv"));
        writer.setHeaderCallback(writer1 -> writer1.write("Id,First Name,Last Name, Email"));
        writer.setLineAggregator(new DelimitedLineAggregator<Student>() {
            {
                setFieldExtractor(new BeanWrapperFieldExtractor<Student>() {
                    {
                        setNames(new String[]{"id", "firstName", "lastName", "email"});
                    }
                });
            }
        });

        writer.setFooterCallback(writer12 -> writer12.write("Created @ " + Instant.now()));
        return writer;
    }
}

My last question basically the same, but datasource is database. e.g. reading a table contains 10k data from dbA and write to dbB. Am I able to read 10 rows at a time, process them and write them to dbB? If so, can you share some pseudocode?

【翻译结束】

英文:

I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.

My questions is:

Or:


@Configuration
public class SampleJob3 {

    @Bean
    public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new JobBuilder(&quot;Job3&quot;, jobRepository)
                .incrementer(new RunIdIncrementer()) // work with program args
                .start(step(jobRepository, transactionManager))
                .build();
    }

    private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new StepBuilder(&quot;Job3 Step started &quot;)
                .&lt;Student, Student&gt;chunk(3)
                .repository(jobRepository)
                .transactionManager(transactionManager)
                .reader(reader(true))
                .processor(student -&gt; {
                    System.out.println(&quot;processor&quot;);
                    return new Student(student.getId(), student.getFirstName() + &quot;!&quot;, student.getLastName() + &quot;!&quot;, student.getEmail() + &quot;!&quot;);
                })
                .writer(writer())
                .build();
    }


    private FlatFileItemReader&lt;Student&gt; reader(boolean isValid) {
        System.out.println(&quot;reader&quot;);
        FlatFileItemReader&lt;Student&gt; reader = new FlatFileItemReader&lt;&gt;();
        // using FileSystemResource if file stores in a directory instead of resource folder
        reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? &quot;input/students.csv&quot; : &quot;input/students_invalid.csv&quot;));
        reader.setLineMapper(new DefaultLineMapper&lt;&gt;() {
            {
                setLineTokenizer(new DelimitedLineTokenizer() {{
                    setNames(&quot;ID&quot;, &quot;First Name&quot;, &quot;Last Name&quot;, &quot;Email&quot;);
                }});

                setFieldSetMapper(new BeanWrapperFieldSetMapper&lt;&gt;() {{
                    setTargetType(Student.class);
                }});
            }
        });
        reader.setLinesToSkip(1);
        return reader;
    }

    //@Bean
    public FlatFileItemWriter&lt;Student&gt; writer() {
        System.out.println(&quot;writer&quot;);
        FlatFileItemWriter&lt;Student&gt; writer = new FlatFileItemWriter&lt;&gt;();
        writer.setResource(new FileSystemResource(&quot;output/students.csv&quot;));
        writer.setHeaderCallback(writer1 -&gt; writer1.write(&quot;Id,First Name,Last Name, Email&quot;));
        writer.setLineAggregator(new DelimitedLineAggregator&lt;&gt;() {{
            setFieldExtractor(new BeanWrapperFieldExtractor&lt;&gt;() {{
                setNames(new String[]{&quot;id&quot;, &quot;firstName&quot;, &quot;lastName&quot;, &quot;email&quot;});
            }});
        }});

        writer.setFooterCallback(writer12 -&gt; writer12.write(&quot;Created @ &quot; + Instant.now()));
        return writer;
    }


}

答案1

得分: 0

Spring Batch中的分块处理步骤不会一次性读取整个文件或表。它将从源头以分块的方式（可配置大小）流式传输数据。

您可以在参考文档中找到有关处理模型的更多详细信息：分块处理。

英文:

A chunk-oriented step in Spring Batch will not read the entire file or table at once. It will rather stream data from the source in chunks (of a configurable size).

You can find more details about the processing model in the reference documentation here: Chunk-oriented Processing.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Spring Batch 从 CSV 文件中读取数据，它是如何工作的？

问题

答案1

为什么在Java N1QL中的ORDER BY没有按预期工作

TextInputLayout在程序中不显示

如何查找一系列数字，并将这些数字乘以2（使用JAVA）？

Flutter – Native Android Component invoke with MethodChannel only works on first time then throws MissingPluginException

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论