英文:
To user Tasklet or Chunk in this scenario
问题
我有一个任务/工作需要“读取给定文件夹/路径”的“子文件夹/目录”。路径是动态的,我们从“控制器”中获取它。目前,我已经使用了“Tasklet”,有3个Tasklet,一个用于读取子目录,另一个用于对其进行处理以准备要保存到数据库的对象,最后一个用于将处理后的数据对象写入数据库。
这些文件夹可以有任意数量的子文件夹。目前,我使用了这段代码:
Path start = Paths.get("x:\\data\\");
Stream<Path> stream = Files.walk(start, 1);
List<String> collect = stream
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
以一次性读取所有子文件夹。
我按照这个“https://www.baeldung.com/spring-batch-tasklet-chunk”的示例来实现了“Tasklet”。这是正确的方法吗?我还需要使用多线程异步运行作业。
由于可能有大量的子文件夹,所以可能会有大量要处理和写入数据库的“行”或数据列表。
请提供一个合适的方法。
我正在学习“Spring Batch”,已经完成了一些关于“文件读取/处理/写入”的示例,并且也使用了“Chunk”方法。
但是我的任务是读取文件夹/路径的子目录,所以我无法决定要采用哪种方法。
英文:
I have a job/task to read sub-folders/directory
of a given folder/path
. The path is dynamic, we get it from Controller
. Currently, I have used Tasklet
, there are 3 tasklets, one to read sub-directories, another to process it to prepare objects to save to DB and last one to write the processed data objects to a database.
The folders can have any number of sub-folders.Currently, I have used this code :
Path start = Paths.get("x:\\data\\");
Stream<Path> stream = Files.walk(start, 1);
List<String> collect = stream
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
To read all the sub folders at once.
I followed this https://www.baeldung.com/spring-batch-tasklet-chunk
example of Tasklet
implementation for the purpose. Is this the right approach ? I also need to run the Job asynchronously with multi-threading.
As there can be huge numbers of sub-folders, so there can be huge number of
rowsor
list of data to process and write to the database.
Please suggest an appropriate approach.
I am learning Spring Batch
, have done few examples on file read/process/write
too and used Chunk
approach for this.
But my job is to read sub-directories of a folder/path, so I cannot decide which approach to follow.
答案1
得分: 1
以下是翻译好的部分:
我有一个类似的情况:我需要从文件夹中读取所有文件,处理并写入数据库,(文档)
@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Bean
public Job job(JobBuilderFactory jobBuilderFactory,
Step masterStep) {
return jobBuilderFactory.get("MainJob")
.incrementer(new RunIdIncrementer())
.flow(masterStep)
.end()
.build();
}
@Bean
public Step mainStep(StepBuilderFactory stepBuilderFactory,
JdbcBatchItemWriter<Transaction> writer,
ItemReader<String> reader,
TransactionItemProcessor processor) {
return stepBuilderFactory.get("Main")
.<String, Transaction>chunk(2)
.reader(reader)
.processor(processor)
.writer(writer)
.taskExecutor(jobTaskExecutor())
.listener(new ItemReaderListener())
.build();
}
@Bean
public TaskExecutor jobTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(2);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
@StepScope
public ItemReader<String> reader(@Value("#{stepExecution}") StepExecution stepExecution) throws IOException {
Path start = Paths.get("D:\\test");
List<String> inputFile = Files.walk(start, 1)
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
return new IteratorItemReader<>(inputFile);
}
@Bean
@StepScope
public TransactionItemProcessor processor(@Value("#{stepExecution}") StepExecution stepExecution) {
return new TransactionItemProcessor();
}
@Bean
@StepScope
public JdbcBatchItemWriter<Transaction> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Transaction>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO transaction (id, date, type) VALUES (:id, :date, :type)")
.dataSource(dataSource)
.build();
}
}
英文:
I have a similar scenario: I need to read all the files from a folder, process and write in db, (Doc)
@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Bean
public Job job(JobBuilderFactory jobBuilderFactory,
Step masterStep) {
return jobBuilderFactory.get("MainJob")
.incrementer(new RunIdIncrementer())
.flow(masterStep)
.end()
.build();
}
@Bean
public Step mainStep(StepBuilderFactory stepBuilderFactory,
JdbcBatchItemWriter<Transaction> writer,
ItemReader<String> reader,
TransactionItemProcessor processor) {
return stepBuilderFactory.get("Main")
.<String, Transaction>chunk(2)
.reader(reader)
.processor(processor)
.writer(writer)
**.taskExecutor(jobTaskExecutor())**
.listener(new ItemReaderListener())
.build();
}
@Bean
public TaskExecutor jobTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(2);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
@StepScope
public ItemReader<String> reader(@Value("#{stepExecution}") StepExecution stepExecution) throws IOException {
Path start = Paths.get("D:\\test");
List<String> inputFile = Files.walk(start, 1)
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
return new IteratorItemReader<>(inputFile);
}
@Bean
@StepScope
public TransactionItemProcessor processor(@Value("#{stepExecution}") StepExecution stepExecution) {
return new TransactionItemProcessor();
}
@Bean
@StepScope
public JdbcBatchItemWriter<Transaction> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Transaction>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO transaction (id, date, type) VALUES (:id, :date, :type)")
.dataSource(dataSource)
.build();
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论