问题

HuggingFace提供了类似以下的training_args。当我使用HF trainer来训练我的模型时，默认使用cuda:0。

我查看了HuggingFace文档，但仍然不知道如何在使用HF trainer时指定要在哪个GPU上运行。

training_args = TrainingArguments(
    output_dir='./results',          # 输出目录
    num_train_epochs=3,              # 总训练轮数
    per_device_train_batch_size=16,  # 每个设备的训练批量大小
    per_device_eval_batch_size=64,   # 评估批量大小
    warmup_steps=500,                # 学习率调度器的预热步数
    weight_decay=0.01,               # 权重衰减强度
    logging_dir='./logs',            # 存储日志的目录
)

英文:

HuggingFace offers training_args like below. When I use HF trainer to train my model, I found cuda:0 is used by default.

I went through the HuggingFace Docs, but still don't know how to specify which GPU to run on when using HF trainer.

training_args = TrainingArguments(
    output_dir=&#39;./results&#39;,          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir=&#39;./logs&#39;,            # directory for storing logs
)

答案1

得分: 4

控制使用哪个 GPU 最常见、实用的方法是设置 CUDA_VISIBLE_DEVICES 环境变量。

如果你想在运行 Python 脚本时通过命令行使用此选项，可以像这样操作：

CUDA_VISIBLE_DEVICES=1 python train.py

或者，你可以在导入 PyTorch 或任何其他基于 CUDA 的库（如 HuggingFace Transformers）之前插入此代码：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # 或者对于多个 GPU，可以设置为 "0,1"

这样，无论你的机器上有多少个 GPU，Hugging Face Trainer 只能看到和使用你指定的 GPU（们）。

英文:

The most common and practical way to control which GPU to use is to set the CUDA_VISIBLE_DEVICES environment variable.

If you want to use this option in the command line when running a python script, you can do it like this:

CUDA_VISIBLE_DEVICES=1 python train.py

Alternatively, you can insert this code before the import of PyTorch or any other CUDA-based library (like HuggingFace Transformers):

import os
os.environ[&quot;CUDA_VISIBLE_DEVICES&quot;] = &quot;1&quot;  # or &quot;0,1&quot; for multiple GPUs

This way, regardless of how many GPUs you have on your machine, the Hugging Face Trainer will only be able to see and use the GPU(s) that you have specified.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在使用Huggingface Trainer时指定要使用的GPU？

问题

答案1

LMM Fine Tuning – Supervised Fine Tuning Trainer (SFTTrainer) vs transformers Trainer

使用GPT-2从输入嵌入中恢复输入ID。

Hugging Face Transformer：模型 bio_ClinicalBERT 没有针对任何任务进行训练吗？

如何使用Huggingface模型deberta-v3-base-absa-v1.1生成预定义方面的情感分数？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论