使用“Tasklet”或“Chunk”来处理这种情况。

huangapple go评论77阅读模式
英文:

To user Tasklet or Chunk in this scenario

问题

我有一个任务/工作需要“读取给定文件夹/路径”的“子文件夹/目录”。路径是动态的,我们从“控制器”中获取它。目前,我已经使用了“Tasklet”,有3个Tasklet,一个用于读取子目录,另一个用于对其进行处理以准备要保存到数据库的对象,最后一个用于将处理后的数据对象写入数据库。
这些文件夹可以有任意数量的子文件夹。目前,我使用了这段代码:

Path start = Paths.get("x:\\data\\");
Stream<Path> stream = Files.walk(start, 1);
List<String> collect = stream
        .map(String::valueOf)
        .sorted()
        .collect(Collectors.toList());

以一次性读取所有子文件夹。
我按照这个“https://www.baeldung.com/spring-batch-tasklet-chunk”的示例来实现了“Tasklet”。这是正确的方法吗?我还需要使用多线程异步运行作业。
由于可能有大量的子文件夹,所以可能会有大量要处理和写入数据库的“行”或数据列表。

请提供一个合适的方法。
我正在学习“Spring Batch”,已经完成了一些关于“文件读取/处理/写入”的示例,并且也使用了“Chunk”方法。
但是我的任务是读取文件夹/路径的子目录,所以我无法决定要采用哪种方法。

英文:

I have a job/task to read sub-folders/directory of a given folder/path. The path is dynamic, we get it from Controller. Currently, I have used Tasklet, there are 3 tasklets, one to read sub-directories, another to process it to prepare objects to save to DB and last one to write the processed data objects to a database.
The folders can have any number of sub-folders.Currently, I have used this code :

 Path start = Paths.get(&quot;x:\\data\\&quot;);
    Stream&lt;Path&gt; stream = Files.walk(start, 1);
List&lt;String&gt; collect = stream
		        .map(String::valueOf)
		        .sorted()
		        .collect(Collectors.toList());

To read all the sub folders at once.
I followed this https://www.baeldung.com/spring-batch-tasklet-chunk example of Tasklet implementation for the purpose. Is this the right approach ? I also need to run the Job asynchronously with multi-threading.
As there can be huge numbers of sub-folders, so there can be huge number of rowsorlist of data to process and write to the database.

Please suggest an appropriate approach.
I am learning Spring Batch, have done few examples on file read/process/write too and used Chunk approach for this.
But my job is to read sub-directories of a folder/path, so I cannot decide which approach to follow.

答案1

得分: 1

以下是翻译好的部分:

我有一个类似的情况:我需要从文件夹中读取所有文件,处理并写入数据库,(文档

@Configuration
@EnableBatchProcessing
public class BatchConfig {

    @Bean
    public Job job(JobBuilderFactory jobBuilderFactory,
                   Step masterStep) {
        return jobBuilderFactory.get("MainJob")
                .incrementer(new RunIdIncrementer())
                .flow(masterStep)
                .end()
                .build();
    }

    @Bean
    public Step mainStep(StepBuilderFactory stepBuilderFactory,
                         JdbcBatchItemWriter<Transaction> writer,
                         ItemReader<String> reader,
                         TransactionItemProcessor processor) {
        return stepBuilderFactory.get("Main")
                .<String, Transaction>chunk(2)
                .reader(reader)
                .processor(processor)
                .writer(writer)
                .taskExecutor(jobTaskExecutor())
                .listener(new ItemReaderListener())
                .build();
    }

    @Bean
    public TaskExecutor jobTaskExecutor() {
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        taskExecutor.setCorePoolSize(2);
        taskExecutor.setMaxPoolSize(10);
        taskExecutor.afterPropertiesSet();
        return taskExecutor;
    }

    @Bean
    @StepScope
    public ItemReader<String> reader(@Value("#{stepExecution}") StepExecution stepExecution) throws IOException {
        Path start = Paths.get("D:\\test");
        List<String> inputFile = Files.walk(start, 1)
                .map(String::valueOf)
                .sorted()
                .collect(Collectors.toList());
        return new IteratorItemReader<>(inputFile);
    }

    @Bean
    @StepScope
    public TransactionItemProcessor processor(@Value("#{stepExecution}") StepExecution stepExecution) {
        return new TransactionItemProcessor();
    }

    @Bean
    @StepScope
    public JdbcBatchItemWriter<Transaction> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Transaction>()
                .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
                .sql("INSERT INTO transaction (id, date, type) VALUES (:id, :date, :type)")
                .dataSource(dataSource)
                .build();
    }
}
英文:

I have a similar scenario: I need to read all the files from a folder, process and write in db, (Doc)

@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Bean
public Job job(JobBuilderFactory jobBuilderFactory,
Step masterStep) {
return jobBuilderFactory.get(&quot;MainJob&quot;)
.incrementer(new RunIdIncrementer())
.flow(masterStep)
.end()
.build();
}
@Bean
public Step mainStep(StepBuilderFactory stepBuilderFactory,
JdbcBatchItemWriter&lt;Transaction&gt; writer,
ItemReader&lt;String&gt; reader,
TransactionItemProcessor processor) {
return stepBuilderFactory.get(&quot;Main&quot;)
.&lt;String, Transaction&gt;chunk(2)
.reader(reader)
.processor(processor)
.writer(writer)
**.taskExecutor(jobTaskExecutor())**
.listener(new ItemReaderListener())
.build();
}
@Bean
public TaskExecutor jobTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(2);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
@StepScope
public ItemReader&lt;String&gt; reader(@Value(&quot;#{stepExecution}&quot;) StepExecution stepExecution) throws IOException {
Path start = Paths.get(&quot;D:\\test&quot;);
List&lt;String&gt; inputFile = Files.walk(start, 1)
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
return new IteratorItemReader&lt;&gt;(inputFile);
}
@Bean
@StepScope
public TransactionItemProcessor processor(@Value(&quot;#{stepExecution}&quot;) StepExecution stepExecution) {
return new TransactionItemProcessor();
}
@Bean
@StepScope
public JdbcBatchItemWriter&lt;Transaction&gt; writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder&lt;Transaction&gt;()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider&lt;&gt;())
.sql(&quot;INSERT INTO transaction (id, date, type) VALUES (:id, :date, :type)&quot;)
.dataSource(dataSource)
.build();
}

}

huangapple
  • 本文由 发表于 2020年10月21日 22:06:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/64465361.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定