英文:
How do I write multiple output files using CoRB?
问题
默认情况下,当我运行一个 CoRB 作业并从处理函数返回数据时,该数据会流式传输到 CoRB 客户端上的单个文件中。我遇到了一个问题,我需要将输出写入不同的文件,每个正在处理的 URI 对应一个文件。如何将 CoRB 输出写入多个文件而不是一个大文件?
我有一个 CoRB 作业,今天返回 URI,这些 URI 会流式传输到一个输出文件中,每个 URI 占据一行。我更希望有一个目录,其中包含多个文件,每个 URI 对应一个文件。
英文:
By default when I run a CoRB job that returns data from the process function that data is streamed into a single file on the CoRB client. I have a problem where I need to write the output to different files, one file per URI that is being processed. How do I write CoRB outputs into multiple files instead of one large file?
I have a CoRB job that returns the URI today, and those URIs are streamed together into one output file with each URI on a new line. I would prefer to have a directory filled with files, and have one file per URI.
答案1
得分: 2
CoRB具有两个内置任务,可用于将PROCESS-MODULE的输出写入文件系统。
ExportBatchToFileTask
生成一个单一文件,通常用于报告。将PROCESS-MODULE返回的数据写入由EXPORT-FILE-NAME指定的单个文件中。整个CoRB返回的所有值都将流式传输到单个文件中。ExportToFileTask
生成多个文件。将每次调用PROCESS-MODULE返回的文档保存到EXPORT-FILE-DIR中的单独本地文件中,其中每个文档的文件名将基于URI构建。
人们通常编写CoRB作业以生成CSV和其他报告,将PROCESS-MODULE执行的输出附加到单个文件中。
如果指定了EXPORT-FILE-NAME选项,那么CoRB将自动使用ExportBatchToFileTask来为您设置PROCESS-TASK选项(除非您明确设置了PROCESS-TASK选项):
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
但是,如果您希望将每个进程模块执行的结果保存为其自己的输出文件,以进行多线程下载/导出,则应配置ExportToFileTask
。它将使用发送到处理模块的URI来构建目录结构和文件名,并将转换结果保存到该文件路径。
您可以设置EXPORT-FILE-DIR以提供一个基础目录,用于写出这些文件。
因此,要配置CoRB以将每个PROCESS-MODULE执行的结果写入其自己的文件中,您需要为您的CoRB作业设置以下选项:
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-DIR=/tmp/export
英文:
CoRB has two built-in Tasks that can be used to write the output of the PROCESS-MODULE to the filesystem.
ExportBatchToFileTask
Generates a single file, typically used for reports. Writes the data returned by the PROCESS-MODULE to a single file specified by EXPORT-FILE-NAME. All returned values from entire CoRB will be streamed into the single file.ExportToFileTask
Generates multiple files. Saves the documents returned by each invocation of PROCESS-MODULE to a separate local file within EXPORT-FILE-DIR where the file name for each document will be the based on the URI.
It is common for people to write CoRB jobs to generate a CSV and other reports that append the output of the PROCESS-MODULE execution into a single file.
If you specify the EXPORT-FILE-NAME option, then CoRB will automatically use ExportBatchToFileTask by seting the PROCESS-TASK option for you (unless you have explicitly set the PROCESS-TASK option):
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
However, if you would prefer to have the results of each process module execution saved as it's own output file, for a multi-threaded download/export, then you would want to configure the ExportToFileTask
. It will use the URI sent to the process module to construct a directory structure an filename, and save the results of the transform to that file path.
You can set the EXPORT-FILE-DIR to provide a base directory in which to write out those files.
So, to configure CoRB to write the results of each PROCESS-MODULE execution to it's own file, you would want to have the following options set for your CoRB job:
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-DIR=/tmp/export
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论