如何使用 CoRB 写多个输出文件?

huangapple go评论47阅读模式
英文:

How do I write multiple output files using CoRB?

问题

默认情况下,当我运行一个 CoRB 作业并从处理函数返回数据时,该数据会流式传输到 CoRB 客户端上的单个文件中。我遇到了一个问题,我需要将输出写入不同的文件,每个正在处理的 URI 对应一个文件。如何将 CoRB 输出写入多个文件而不是一个大文件?

我有一个 CoRB 作业,今天返回 URI,这些 URI 会流式传输到一个输出文件中,每个 URI 占据一行。我更希望有一个目录,其中包含多个文件,每个 URI 对应一个文件。

英文:

By default when I run a CoRB job that returns data from the process function that data is streamed into a single file on the CoRB client. I have a problem where I need to write the output to different files, one file per URI that is being processed. How do I write CoRB outputs into multiple files instead of one large file?

I have a CoRB job that returns the URI today, and those URIs are streamed together into one output file with each URI on a new line. I would prefer to have a directory filled with files, and have one file per URI.

答案1

得分: 2

CoRB具有两个内置任务,可用于将PROCESS-MODULE的输出写入文件系统。

  • ExportBatchToFileTask 生成一个单一文件,通常用于报告。将PROCESS-MODULE返回的数据写入由EXPORT-FILE-NAME指定的单个文件中。整个CoRB返回的所有值都将流式传输到单个文件中。
  • ExportToFileTask 生成多个文件。将每次调用PROCESS-MODULE返回的文档保存到EXPORT-FILE-DIR中的单独本地文件中,其中每个文档的文件名将基于URI构建。

人们通常编写CoRB作业以生成CSV和其他报告,将PROCESS-MODULE执行的输出附加到单个文件中。
如果指定了EXPORT-FILE-NAME选项,那么CoRB将自动使用ExportBatchToFileTask来为您设置PROCESS-TASK选项(除非您明确设置了PROCESS-TASK选项):

PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask

但是,如果您希望将每个进程模块执行的结果保存为其自己的输出文件,以进行多线程下载/导出,则应配置ExportToFileTask。它将使用发送到处理模块的URI来构建目录结构和文件名,并将转换结果保存到该文件路径。

您可以设置EXPORT-FILE-DIR以提供一个基础目录,用于写出这些文件。

因此,要配置CoRB以将每个PROCESS-MODULE执行的结果写入其自己的文件中,您需要为您的CoRB作业设置以下选项:

PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-DIR=/tmp/export
英文:

CoRB has two built-in Tasks that can be used to write the output of the PROCESS-MODULE to the filesystem.

  • ExportBatchToFileTask Generates a single file, typically used for reports. Writes the data returned by the PROCESS-MODULE to a single file specified by EXPORT-FILE-NAME. All returned values from entire CoRB will be streamed into the single file.
  • ExportToFileTask Generates multiple files. Saves the documents returned by each invocation of PROCESS-MODULE to a separate local file within EXPORT-FILE-DIR where the file name for each document will be the based on the URI.

It is common for people to write CoRB jobs to generate a CSV and other reports that append the output of the PROCESS-MODULE execution into a single file.
If you specify the EXPORT-FILE-NAME option, then CoRB will automatically use ExportBatchToFileTask by seting the PROCESS-TASK option for you (unless you have explicitly set the PROCESS-TASK option):

PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask

However, if you would prefer to have the results of each process module execution saved as it's own output file, for a multi-threaded download/export, then you would want to configure the ExportToFileTask. It will use the URI sent to the process module to construct a directory structure an filename, and save the results of the transform to that file path.

You can set the EXPORT-FILE-DIR to provide a base directory in which to write out those files.

So, to configure CoRB to write the results of each PROCESS-MODULE execution to it's own file, you would want to have the following options set for your CoRB job:

PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-DIR=/tmp/export

huangapple
  • 本文由 发表于 2023年6月5日 22:29:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407469.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定