如何使用DVC工作流添加/更新数据?

huangapple go评论69阅读模式
英文:

How to add/update data with dvc workflow?

问题

当我们设置DVC时,我可以简单地将整个目录添加如下,dvc add dataset,我的工作流程将是为下一次迭代更新整个数据集文件夹。这个文件夹的内容应该被缓存。如果我想要回到以前的数据版本,我应该能够执行 dvc checkout 吗?或者逐个将每个文件添加到DVC会更好吗?

我迄今为止一直在追踪单个文件,但如果我有数百个文件,追踪整个文件夹会更容易吗?

英文:

I want to know , when we set up DVC, can I simply, add my entire directory as such, dvc add dataset and my workflow would be to update the entire dataset folder for next iteration. The contents of this folder should be cached. And if I ever wanna go back to previous version of data, I should be able to do a dvc checkout? Or is it better to add each file to DVC individually?

  1. .dvc
  2. - config
  3. dataset
  4. - fileone.cvs
  5. - train.py
  6. - requirements.txt

I have tracked individual files so far, but would be easier to track entire folder in the event I have 100s of files?

答案1

得分: 3

是的,整个目录可以一次性添加,这是在DVC中处理目录的推荐方式。建议避免拥有数百个.dvc文件,这不符合DVC的优化目标。

文档中有一个例子。基本上,你可以执行:

  1. dvc add dataset

无论dataset目录中包含多少文件,DVC都会创建一个名为dataset.dvc的单个文件来处理整个目录。文件将被缓存(每个数据集的每个唯一文件一次)。

要稍后更新它,你可以运行dvc adddvc commit。要返回到先前的版本,你可以使用与此处描述的相同机制。

如果目录中有很多文件,请还请阅读大型数据集优化

英文:

Yes, the whole directory can be added at once and this is the recommended way to handle directories in DVC. Having 100s of .dvc files is discouraged and not what DVC is optimized for.

Here is an example in the documentation. Pretty much, you can do:

  1. dvc add dataset

No matter how many files are inside the dataset directory, DVC will create a single dataset.dvc file that will handle the whole directory. Files will be cached (one time per unique file per dataset).

To update it later, you could run dvc add or dvc commit. To get to the previous version, you will be able to do use the same mechanics as described here.

Here is the brief summary of some technical details that I recommend to read if you'd like to understand the implications better.

If there a lot of files inside the directory, please also read Large Dataset Optimization.

huangapple
  • 本文由 发表于 2023年4月4日 04:24:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923500.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定