生成可执行的.ml测试用例,使用dune从一组纯文本文件中。

huangapple go评论94阅读模式
英文:

Generating executable .ml test cases from a glob of plaintext files using dune

问题

我正在为一些词法分析和解析编写测试套件,如果我能够将测试输入/输出文件放在一个目录中,并在编译的某个步骤中让dune为每个文件生成OCaml测试用例,那将会更加清晰。

我认为我可以使用dune来实现这一点,受到了这个文档页面(Preprocessors and PPXs)的启发,但我在让它工作方面遇到了困难。我基本上遇到了两个死胡同:

  1. 一个别名规则,执行一个脚本来处理每个测试文件,似乎行不通:
   (tests
     (names lexer)
     (libraries llvmlexer llvmparser ounit2))
    
   (rule
     (alias runtest)
     (deps (glob_files %{workspace_root}/**/*.ll))
       (action (system "sh ./preprocess-lexer.sh '%{input-file}'")))

因为它报错:

    File "test/dune", line 9, characters 41-54:
    9 |  (action (system "sh ./preprocess-lexer.sh '%{input-file}'")))
                                                 ^^^^^^^^^^^^^
    Error: %{input-file} isn't allowed in this position.

我对此感到非常困惑。这是执行一次动作来处理所有文件的问题吗?如果是这样,是否可以为每个依赖项执行一次动作?

  1. 指定所有源和目标也行不通,因为这将需要在dune文件中列出所有规则,而通配符规则似乎仍然不支持:https://github.com/ocaml/dune/issues/307
英文:

I'm in the process of writing a test suite for some lexing/parsing and it would be much cleaner if I could drop test input/output files in a directory and have dune generate OCaml test cases for each of these during a step in compilation.

I figured I could use dune for this, very much inspired by this documentation page (Preprocessors and PPXs), but I'm struggling at getting it to work. I've essentially come to 2 dead ends:

  1. An alias rule that would execute a script padding each of the test files seemingly wouldn't work:

    (tests
      (names lexer)
      (libraries llvmlexer llvmparser ounit2))
    
    (rule
      (alias runtest)
      (deps (glob_files %{workspace_root}/**/*.ll))
        (action (system "./preprocess-lexer.sh '%{input-file}'")))
    

As it errors with:

File "test/dune", line 9, characters 41-54:
9 |  (action (system "./preprocess-lexer.sh '%{input-file}'")))
                                             ^^^^^^^^^^^^^
Error: %{input-file} isn't allowed in this position.

I'm very confused by this. Is this a matter of executing the action once for all files? If so is it possible to execute it once for each dependency?

  1. Neither would having all source/targets specified, as that would entail listing them all in the dune file as wildcard rules is apparently still not a thing: https://github.com/ocaml/dune/issues/307

答案1

得分: 3

在目前的编写时,dune不支持通配符规则。然而,它对预处理提供了非常有限的支持,以便您可以指定以下形式的规则 *.ml -> *.pp.ml,完全使用这些后缀,例如,

(library
 (name foo)
 (preprocess (action (run cpp %{input-file}))))

然后,如果您有一个文件 bar.ml

#define X 1

let x = X

它将被预处理为 bar.pp.ml 文件,该文件将放置在构建目录中并用于代替 bar.ml。这就是这个机制的工作原理,它只设计用于OCaml源文件。如果适用于您,您只需修复后缀,即将您的 .ll 文件重命名为 .ml,并指定使用您的预处理器的预处理部分,而不是我在示例中使用的 cpp

上述描述的机制称为“通过用户操作进行预处理”,不应与更一般的(也使用操作)自定义规则 部分混淆。此部分的常见用法是定义以下形式的规则,

(rule
 (target foo.data)
 (deps foo.data.src)
 (action
  (with-stdin-from %{deps}
   (with-stdout-to %{target}
    (chdir %{workspace_root}
     (run ./tools/my_rewriter.sh))))))

其中 ./tools/my_rewriter.sh 将从stdin接收 foo.data.src 的内容,并将其打印的所有内容重定向到 foo.data。(请注意,./tools/my_rewriter.sh 是从项目的顶层的路径)。您不能指定通配符,例如

(target *.data)
(deps *.data.src)

并期望它对每个具有匹配后缀的文件进行调用。再次强调,截止到目前编写的时间,dune尚未实现这样的机制。但是,您有两个解决方法作为变通办法。

选项1. 自动生成规则

您可以依赖 OCaml Syntax 并生成包含每个文件夹中每个 *.data.src 的规则的dune文件。我个人不建议这样做,因为OCaml Syntax支持的状态不清楚,可能会在一般情况下出现问题。

或者,您可以在构建过程中添加一个额外的阶段,例如,一个 ./configure 脚本,它将生成带有所有这些规则的dune文件。

当然,您也可以手动编写它们 生成可执行的.ml测试用例,使用dune从一组纯文本文件中。

选项2. 使用通配符和目录依赖项

您可以使用 glob_files,然后更改您的操作以接受一组文件并生成一组文件,例如,使用GNU并行,

(rule 
  (deps (glob_files *.data.in)
  (action (run parallel cp {} {.} ::: %{deps})))

对于每个 <foo>.data.in,这个规则都会生成 <foo>.data。(当然,您可以编写自己的for循环,而不使用并行)。

这种方法的注意事项是,由于这个规则不指定目标,所以所有生成的文件最终都将被dune删除。问题在于,与 deps 不同,targets 部分不接受 glob_files,这是有道理的,因为在规则应用时不希望目标存在。

为了解救,我们有新的 directory-targets。要启用它,您需要在您的 dune-project 中加入以下内容(lang应大于或等于3.0):

(lang dune 3.0)
(using directory-targets 0.1)

现在,您可以将要预处理的测试输入数据文件放在与测试驱动程序相同的文件夹中。在这种情况下,我使用 *.data.src 作为输入文件,test_foo.ml

(rule
 (deps (glob_files *.data.src))
 (target (dir data))
 (action
  (progn
    (run mkdir -p data)
    (run parallel cp {} data/{.} ::: %{deps}))))

(test
(name test_foo)
(deps data))

"(run parallel cp {} data/{.} ::: %{deps})" 会为每个匹配 *.data.src 的文件 &lt;file&gt;.data.src 调用 cp &lt;file&gt;.data.src data/&lt;file&gt;.data。您可以用接受一组输入文件并生成预处理文件的命令来替代它。这个命令甚至可以用OCaml实现,只需指定 ./path/to/your/tool.exe 作为命令,dune将从 ./path/to/your/tool.ml 自动构建它。

在这个设置中,每当您更改输入 *.data.src 文件或测试的任何其他依赖项时,dune test 将重建数据文件夹并正确重新运行测试。

出于完整性的考虑,这是我的 test_foo.ml 文件的内容,

open Printf

let () =
  Sys.readdir &quot;data&quot; |> Array.iter @@ fun file -&gt;
  if Filename.check_suffix file &quot;.data&quot;
  then printf &quot;testing with %s\n%!&quot; file

这是一个示例目录结构,

$ tree
.
|-- bar.ml
|-- dune
|-- dune-project
`-- test
    |-- bar.data.src
    |-- dune
    |-- foo.data.src
    `-- test_foo.ml

1 directory, 7 files

如果您想要一个完全可工作的示例,请随时联系我。

英文:

Indeed, dune doesn't support wildcard rules at the time of writing. It has, however, very limited support for it tailored for preprocessing so that you can specify a rule of the following form *.ml -&gt; *.pp.ml, exactly with these suffixes, e.g.,

(library
 (name foo)
 (preprocess (action (run cpp %{input-file}))))

And then if you have a file bar.ml

#define X 1

let x = X

It will be preprocessed to a bar.pp.ml file, which will be dropped in the build directory and used instead of bar.ml. This is how this mechanism works and it is designed to work only with the OCaml source files. And if it suits you, you just need to fix the suffixes, i.e., you need to rename your .ll files to .ml and specify the preprocess stanza that uses you preprocessor instead of cpp that I have used in the example.

The mechanism described above is called "preprocessing via user actions", which should be confused with the more general (and also using actions) custom rule stanza. The common use of this stanza is to define the rules of the form,

(rule
 (target foo.data)
 (deps foo.data.src)
 (action
  (with-stdin-from %{deps}
   (with-stdout-to %{target}
    (chdir %{workspace_root}
     (run ./tools/my_rewriter.sh))))))

where ./tools/my_rewriter.sh will receive the contents of foo.data.src in stdin and everything it prints will be redirected to foo.data. (Note that ./tools/my_rewriter.sh is the path from the top-level of your project). You can't specify a wildcard, like

(target *.data)
(deps *.data.src)

and expect it to be called for each file with the matching suffixes. Again, at the time of writing such a mechanism is not implemented in dune. You have, however two options as workarounds.

Option 1. Autogenerating the Rules

You can either rely on the OCaml Syntax and produce the dune file that contains such a rule replicated for each *.data.src in the folder. I wouldn't personally recommend this, as the status of the OCaml Syntax support is not clear and it might misbehave in general.

Alternatively, you can add an extra stage to your build process, e.g., a ./configure script that will generate the dune file with all these rules.

You can also write them manually, of course 生成可执行的.ml测试用例,使用dune从一组纯文本文件中。

Option 2. Using Globs and Directory Dependencies

You can use glob_files and then change your action so that it takes a set of files and produce a set of files, e.g., using GNU parallel,

(rule 
  (deps (glob_files *.data.in)
  (action (run parallel cp {} {.} ::: %{deps})))

And this rule for each &lt;foo&gt;.data.in will produce &lt;foo&gt;.data. (Of course, you can write your own for loop, instead of using parallel).

The caveat with this approach is that since this rule doesn't specify targets, then all produced files will be eventually deleted by dune. And the problem is that unlike deps the targets stanza doesn't accept glob_files, which perfectly makes sense, as the targets are not expected to exist at the time of rule application.

For the rescue, we have the new directory-targets. To enable it, you need the following in your dune-project (the lang shall be greater than or equal to 3.0):

(lang dune 3.0)
(using directory-targets 0.1)

Now you can put the test input data files that you would like to preprocess in the same folder as your test driver. In this case, I use *.data.src as the input files and test_foo.ml

(rule
 (deps (glob_files *.data.src))
 (target (dir data))
 (action
  (progn
    (run mkdir -p data)
    (run parallel cp {} data/{.} ::: %{deps}))))


(test
 (name test_foo)
 (deps data))

The (run parallel cp {} data/{.} ::: %{deps}) will call cp &lt;file&gt;.data.src data/&lt;file&gt;.data for each &lt;file&gt; matching *.data.src. You can substitute it with your command which takes the set of input files and populates it with the preprocessed files. This command could even be implemented in OCaml, just specify ./path/to/your/tool.exe as the command and dune will build it automatically from ./path/to/your/tool.ml.

In this setup, whenever you change an input *.data.src file, or any other dependency of the test, dune test will rebuild the data folder and correctly rerun the tests.

For the sake of completeness, here is the contents of my test_foo.ml file,

open Printf

let () =
  Sys.readdir &quot;data&quot; |&gt; Array.iter @@ fun file -&gt;
  if Filename.check_suffix file &quot;.data&quot;
  then printf &quot;testing with %s\n%!&quot; file

And here's a sample directory structure,

$ tree
.
|-- bar.ml
|-- dune
|-- dune-project
`-- test
    |-- bar.data.src
    |-- dune
    |-- foo.data.src
    `-- test_foo.ml

1 directory, 7 files

Feel free to poke me if you want to get a fully working example.

huangapple
  • 本文由 发表于 2023年6月16日 14:14:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定