问题

我有一个任务/工作需要“读取给定文件夹/路径”的“子文件夹/目录”。路径是动态的，我们从“控制器”中获取它。目前，我已经使用了“Tasklet”，有3个Tasklet，一个用于读取子目录，另一个用于对其进行处理以准备要保存到数据库的对象，最后一个用于将处理后的数据对象写入数据库。
这些文件夹可以有任意数量的子文件夹。目前，我使用了这段代码：

Path start = Paths.get("x:\\data\\");
Stream<Path> stream = Files.walk(start, 1);
List<String> collect = stream
        .map(String::valueOf)
        .sorted()
        .collect(Collectors.toList());

以一次性读取所有子文件夹。
我按照这个“https://www.baeldung.com/spring-batch-tasklet-chunk”的示例来实现了“Tasklet”。这是正确的方法吗？我还需要使用多线程异步运行作业。
由于可能有大量的子文件夹，所以可能会有大量要处理和写入数据库的“行”或数据列表。

请提供一个合适的方法。
我正在学习“Spring Batch”，已经完成了一些关于“文件读取/处理/写入”的示例，并且也使用了“Chunk”方法。
但是我的任务是读取文件夹/路径的子目录，所以我无法决定要采用哪种方法。

英文:

I have a job/task to read sub-folders/directory of a given folder/path. The path is dynamic, we get it from Controller. Currently, I have used Tasklet, there are 3 tasklets, one to read sub-directories, another to process it to prepare objects to save to DB and last one to write the processed data objects to a database.
The folders can have any number of sub-folders.Currently, I have used this code :

 Path start = Paths.get(&quot;x:\\data\\&quot;);
    Stream&lt;Path&gt; stream = Files.walk(start, 1);
List&lt;String&gt; collect = stream
		        .map(String::valueOf)
		        .sorted()
		        .collect(Collectors.toList());

To read all the sub folders at once.
I followed this https://www.baeldung.com/spring-batch-tasklet-chunk example of Tasklet implementation for the purpose. Is this the right approach ? I also need to run the Job asynchronously with multi-threading.
As there can be huge numbers of sub-folders, so there can be huge number of rowsorlist of data to process and write to the database.

Please suggest an appropriate approach.
I am learning Spring Batch, have done few examples on file read/process/write too and used Chunk approach for this.
But my job is to read sub-directories of a folder/path, so I cannot decide which approach to follow.

答案1

得分: 1

以下是翻译好的部分：

我有一个类似的情况：我需要从文件夹中读取所有文件，处理并写入数据库，（文档）

@Configuration
@EnableBatchProcessing
public class BatchConfig {

    @Bean
    public Job job(JobBuilderFactory jobBuilderFactory,
                   Step masterStep) {
        return jobBuilderFactory.get("MainJob")
                .incrementer(new RunIdIncrementer())
                .flow(masterStep)
                .end()
                .build();
    }

    @Bean
    public Step mainStep(StepBuilderFactory stepBuilderFactory,
                         JdbcBatchItemWriter<Transaction> writer,
                         ItemReader<String> reader,
                         TransactionItemProcessor processor) {
        return stepBuilderFactory.get("Main")
                .<String, Transaction>chunk(2)
                .reader(reader)
                .processor(processor)
                .writer(writer)
                .taskExecutor(jobTaskExecutor())
                .listener(new ItemReaderListener())
                .build();
    }

    @Bean
    public TaskExecutor jobTaskExecutor() {
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        taskExecutor.setCorePoolSize(2);
        taskExecutor.setMaxPoolSize(10);
        taskExecutor.afterPropertiesSet();
        return taskExecutor;
    }

    @Bean
    @StepScope
    public ItemReader<String> reader(@Value("#{stepExecution}") StepExecution stepExecution) throws IOException {
        Path start = Paths.get("D:\\test");
        List<String> inputFile = Files.walk(start, 1)
                .map(String::valueOf)
                .sorted()
                .collect(Collectors.toList());
        return new IteratorItemReader<>(inputFile);
    }

    @Bean
    @StepScope
    public TransactionItemProcessor processor(@Value("#{stepExecution}") StepExecution stepExecution) {
        return new TransactionItemProcessor();
    }

    @Bean
    @StepScope
    public JdbcBatchItemWriter<Transaction> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Transaction>()
                .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
                .sql("INSERT INTO transaction (id, date, type) VALUES (:id, :date, :type)")
                .dataSource(dataSource)
                .build();
    }
}

英文:

I have a similar scenario: I need to read all the files from a folder, process and write in db, (Doc)

@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Bean
public Job job(JobBuilderFactory jobBuilderFactory,
Step masterStep) {
return jobBuilderFactory.get(&quot;MainJob&quot;)
.incrementer(new RunIdIncrementer())
.flow(masterStep)
.end()
.build();
}
@Bean
public Step mainStep(StepBuilderFactory stepBuilderFactory,
JdbcBatchItemWriter&lt;Transaction&gt; writer,
ItemReader&lt;String&gt; reader,
TransactionItemProcessor processor) {
return stepBuilderFactory.get(&quot;Main&quot;)
.&lt;String, Transaction&gt;chunk(2)
.reader(reader)
.processor(processor)
.writer(writer)
**.taskExecutor(jobTaskExecutor())**
.listener(new ItemReaderListener())
.build();
}
@Bean
public TaskExecutor jobTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(2);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
@StepScope
public ItemReader&lt;String&gt; reader(@Value(&quot;#{stepExecution}&quot;) StepExecution stepExecution) throws IOException {
Path start = Paths.get(&quot;D:\\test&quot;);
List&lt;String&gt; inputFile = Files.walk(start, 1)
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
return new IteratorItemReader&lt;&gt;(inputFile);
}
@Bean
@StepScope
public TransactionItemProcessor processor(@Value(&quot;#{stepExecution}&quot;) StepExecution stepExecution) {
return new TransactionItemProcessor();
}
@Bean
@StepScope
public JdbcBatchItemWriter&lt;Transaction&gt; writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder&lt;Transaction&gt;()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider&lt;&gt;())
.sql(&quot;INSERT INTO transaction (id, date, type) VALUES (:id, :date, :type)&quot;)
.dataSource(dataSource)
.build();
}

}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用“Tasklet”或“Chunk”来处理这种情况。

问题

答案1

JPA实体未在迁移查询中使用默认值创建

错误：语句关闭后不允许进行任何操作

寻找数组中的最小峰值元素

在Heroku上部署JHipster，使用远程MySQL数据库无法工作。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论