Spring Batch 将已处理的记录写入文件。

huangapple go评论41阅读模式
英文:

Spring Batch Write processed records to file

问题

这是我以前的问题的延续question。由于原始问题已关闭

根据接受的答案,tasklet可以使用,我还尝试了在基于块的步骤中实现自定义项写入器,该写入器使用jackson / JsonFileItemWriter,我们可以使用这个吗?它是否会对性能产生影响?

public void write(final List<? extends Person> persons) throws Exception {
                
    for (Person person: persons) {
        objectMapper.writeValue(new File("D:/cp/dataTwo.json"), person);
    }
                
}

问题1: "上述方法是否推荐?"

问题2: "我们可以在项目处理器中生成文件并使用无操作项写入器吗?"

有人可以帮忙吗?

英文:

This is in continuation of my previous question. As the original question is closed

As per accepted answer, tasklet can be used , I have also tried implementing custom item writer in a chunk oriented step which uses jackson / JsonFileItemWriter, can we use this or does it have any performance impact ?

public void write(final List&lt;? extends Person&gt; persons) throws Exception {
            
       for (Person  person: persons) {
            objectMapper.writeValue(new File(&quot;D:/cp/dataTwo.json&quot;), person);
       }
            
}

Question 1 : "Is the above approach recommended ?"

Question 2 : "Can we generate file in item processor itself and use no-op item writer ?"

Can some one please help ?

答案1

得分: 2

如果你看Spring Batch框架,它包含三个步骤,如这里所提到的 -

Spring Batch 将已处理的记录写入文件。

这意味着它将输入和输出分开作为独立的操作。所以,如果你计划将写入和处理混在一起,基本上是违反了目的,会引入紧密耦合,可能会影响长期性能。 (可以将其视为一个Map-Reduce操作。这些需要是互斥的,明确定义的输入和输出。)

现在,关于建议的问题,是的。如果你正在使用Spring Batch,这是处理记录的最佳方式,以块的方式读取它们,然后以块的方式写入它们。通常批处理用于处理隔离的任务,这样在需要的时候,可以并行执行任务。所以,只要你不同时修改同一个文件,你应该可以使用这种方法。

英文:

If you look at the spring batch framework, it contains three steps, as mentioned here -

Spring Batch 将已处理的记录写入文件。

Which means it is separating input and output as a separate operation. So, if you plan to mix up writing and processing together, basically its a violation of purpose, and will introduce tight coupling which might impact your performance in long run. (think of it as a map-reduce operation. Those need to be mutually exclusive, clearly defined input and outputs.)

Now, the question about recommendation, Yes. If you are using spring batch, this is the best way to process the records, read them in chunk, and then write them in chunk. Usually batch is used to process isolated tasks, so that it time comes, the stuff can be executed in parallel. So, as long as you are not modifying the same file concurrently, you should be good to go with this approach..

答案2

得分: 2

问题1:上述方法是否推荐?

  • 您的情况是顺序的,您必须为每个记录编写一个文件。因此,通过拥有一个接收记录块的写入器,您不会获得任何额外的优势。

  • 如果在您的写入器中发生任何错误,Spring Batch 将不得不重试整个块,并重新写入到目前为止在该块中成功的文件,因为 Spring Batch 不知道哪条记录导致写入失败。所以我认为与另一个问题上基于 Tasklet 的答案相比,这是一个劣势。

我们可以在项目处理器中生成文件,然后使用 no-op 项目写入器吗?

  • 我在这里没有看到大的性能问题或错误处理问题,因为即使对于每个块都调用了一个空的 no-op 写入器,它也是逐条记录进行处理的。但 Spring 可能会在将块传递给写入器之前对其进行缓存,以便在写入器抛出可跳过的异常时可以重试。所以即使您使用了 no-op 写入器,它仍然会进行缓存,但我不知道它会多快清除缓存,因为您的写入器是 no-op 写入器。

  • 从最佳实践的角度来看,我对这种方法感到非常不安,因为如果新的开发人员加入,他们不会查看您的处理器以了解它是作为写入器运行的。

总结

我会选择另一个问题上基于 Tasklet 的方法。

英文:

> Question 1 : Is the above approach recommended?

  • Your case is sequential and you have to write one file per record. So you are not achieving any additional advantage by having a writer that receives chunk of records.

  • If any error happens in your writer, spring batch will have to retry the whole chunk and rewrite the files that succeeded so far in that chunk as spring batch wouldn't which record in the check failed to write. So I see it as downside compared to tasklet based answer on the other question.

> Can we generate file in item processor itself and use no-op item writer?

  • I don't see a big performance issue here or error handling issue here as it is record by record even if an empty no-op writer invoked for every chunk. But Spring must be caching the chunk before passing it to writer so incase writer throws skippable exception, it can retry. So even if you use no-op writer, it will be caching it but I don't know how quickly it will clear it as yours is a no-op writer.

  • I am very uneasy about this approach from best practices point of view, as if a new dev join, he will not go looking into your processor to understand it is acting as writer.

Summary

I will go with Tasklet based approach on the other question

huangapple
  • 本文由 发表于 2020年7月30日 01:53:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/63159661.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定