英文:
How can I specify which GPU to use when using Huggingface Trainer
问题
HuggingFace提供了类似以下的training_args。当我使用HF trainer来训练我的模型时,默认使用cuda:0。
我查看了HuggingFace文档,但仍然不知道如何在使用HF trainer时指定要在哪个GPU上运行。
training_args = TrainingArguments(
output_dir='./results', # 输出目录
num_train_epochs=3, # 总训练轮数
per_device_train_batch_size=16, # 每个设备的训练批量大小
per_device_eval_batch_size=64, # 评估批量大小
warmup_steps=500, # 学习率调度器的预热步数
weight_decay=0.01, # 权重衰减强度
logging_dir='./logs', # 存储日志的目录
)
英文:
HuggingFace offers training_args like below. When I use HF trainer to train my model, I found cuda:0 is used by default.
I went through the HuggingFace Docs, but still don't know how to specify which GPU to run on when using HF trainer.
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
答案1
得分: 4
控制使用哪个 GPU 最常见、实用的方法是设置 CUDA_VISIBLE_DEVICES 环境变量。
如果你想在运行 Python 脚本时通过命令行使用此选项,可以像这样操作:
CUDA_VISIBLE_DEVICES=1 python train.py
或者,你可以在导入 PyTorch 或任何其他基于 CUDA 的库(如 HuggingFace Transformers)之前插入此代码:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # 或者对于多个 GPU,可以设置为 "0,1"
这样,无论你的机器上有多少个 GPU,Hugging Face Trainer 只能看到和使用你指定的 GPU(们)。
英文:
The most common and practical way to control which GPU to use is to set the CUDA_VISIBLE_DEVICES environment variable.
If you want to use this option in the command line when running a python script, you can do it like this:
CUDA_VISIBLE_DEVICES=1 python train.py
Alternatively, you can insert this code before the import of PyTorch or any other CUDA-based library (like HuggingFace Transformers):
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # or "0,1" for multiple GPUs
This way, regardless of how many GPUs you have on your machine, the Hugging Face Trainer will only be able to see and use the GPU(s) that you have specified.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论