英文:
Does dvc checkout pulls the data or just checkouts .dvc files?
问题
我正在通过DVC(数据版本控制)测试数据跟踪。我已经按照dvc.org上的示例添加了数据,并将生成的文件也添加到了Git中。现在,在我运行以下命令之后:
git checkout
我是运行dvc pull
还是dvc checkout
?这些命令在幕后执行什么操作?
项目
.dvc
train.py
数据
abc.csv
data.dvc
我已初始化了一个Git仓库,安装并初始化了DVC,并使用以下命令添加了一些数据:
dvc add data
英文:
I am testing data tracking via dvc (data version control) . I went through the example at dvc.org and added the data. and added the files generated to git as well. now , after I do run
git checkout
do I run dvc pull or dvc checkout. what do these command do behind the scenes?
project
.dvc
train.py
data
abc.csv
data.dvc
I have initialized a git repository , installed and initialized dvc and added some data with command below
dvc add data
答案1
得分: 2
dvc pull
命令与 dvc fetch
+ dvc checkout
相同。
dvc fetch
从远程存储(可以是S3、Google Cloud等)下载数据到DVC缓存中,而 checkout
然后在工作区中“实例化”这些文件。
虽然不是完全相同的比较,但大致可以将 dvc pull
与 git pull
、dvc fetch
与 git fetch
以及 dvc checkout
与 git checkout
进行比较 - 它们具有类似的目的,但适用于大文件或您不希望直接保存在Git中的目录,而是保存在云端、SSH、NAS服务器等位置。
此外,除了 dvc add
之外,您需要运行 dvc push
来保存您的数据,以便您的团队(或您在不同的机器上)之后可以运行 dvc pull
。
英文:
dvc pull
command is the same as dvc fetch
+ dvc checkout
.
dvc fetch
is downloading data from the remote storage (can be S3, Google Cloud, etc) into DVC cache, while checkout
then "instantiates" those file in the workspace.
Not a perfect comparison, but roughly you can compare dvc pull
with git pull
, dvc fetch
with git fetch
, and dvc checkout
with git checkout
- they serve similar purpose but for large files or directories that you want to save not in Git directly, but on the cloud, SSH, NAS server, etc.
Btw, besides dvc add
you need to run dvc push
to save your data, so that your team (or you on a different) machine could run dvc pull
later.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论