如何在使用Huggingface Trainer时指定要使用的GPU?

huangapple go评论99阅读模式
英文:

How can I specify which GPU to use when using Huggingface Trainer

问题

HuggingFace提供了类似以下的training_args。当我使用HF trainer来训练我的模型时,默认使用cuda:0。

我查看了HuggingFace文档,但仍然不知道如何在使用HF trainer时指定要在哪个GPU上运行。

training_args = TrainingArguments(
    output_dir='./results',          # 输出目录
    num_train_epochs=3,              # 总训练轮数
    per_device_train_batch_size=16,  # 每个设备的训练批量大小
    per_device_eval_batch_size=64,   # 评估批量大小
    warmup_steps=500,                # 学习率调度器的预热步数
    weight_decay=0.01,               # 权重衰减强度
    logging_dir='./logs',            # 存储日志的目录
)
英文:

HuggingFace offers training_args like below. When I use HF trainer to train my model, I found cuda:0 is used by default.

I went through the HuggingFace Docs, but still don't know how to specify which GPU to run on when using HF trainer.

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

答案1

得分: 4

控制使用哪个 GPU 最常见、实用的方法是设置 CUDA_VISIBLE_DEVICES 环境变量。

如果你想在运行 Python 脚本时通过命令行使用此选项,可以像这样操作:

CUDA_VISIBLE_DEVICES=1 python train.py

或者,你可以在导入 PyTorch 或任何其他基于 CUDA 的库(如 HuggingFace Transformers)之前插入此代码:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # 或者对于多个 GPU,可以设置为 "0,1"

这样,无论你的机器上有多少个 GPU,Hugging Face Trainer 只能看到和使用你指定的 GPU(们)。

英文:

The most common and practical way to control which GPU to use is to set the CUDA_VISIBLE_DEVICES environment variable.

If you want to use this option in the command line when running a python script, you can do it like this:

CUDA_VISIBLE_DEVICES=1 python train.py

Alternatively, you can insert this code before the import of PyTorch or any other CUDA-based library (like HuggingFace Transformers):

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # or "0,1" for multiple GPUs

This way, regardless of how many GPUs you have on your machine, the Hugging Face Trainer will only be able to see and use the GPU(s) that you have specified.

huangapple
  • 本文由 发表于 2023年6月8日 05:20:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76427195.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定