2023年5月30日 02:11:49go评论68阅读模式

英文:

Hugging Face Transformers trainer: per_device_train_batch_size vs auto_find_batch_size

问题

Hugging Face Transformers Trainer可以接受per_device_train_batch_size参数或auto_find_batch_size参数。

然而，它们似乎有不同的效果。需要考虑的一点是，per_device_train_batch_size默认为8：它始终被设置，无法禁用。

我还观察到，如果遇到OOM错误，降低per_device_train_batch_size可以解决问题，但auto_find_batch_size无法解决问题。这相当令人反感，因为它应该找到足够小的批次大小（我可以手动做到）。

那么，auto_find_batch_size到底是什么作用呢？

英文:

A Hugging Face Transformers Trainer can receive a per_device_train_batch_size argument, or an auto_find_batch_size argument.

However, they seem to have different effects. One thing to consider is that per_device_train_batch_size defaults to 8: it is always set, and you can't disable it.

I have also observed that if I run into OOM errors, lowering per_device_train_batch_size can solve the issue, but auto_find_batch_size doesn't solve the problem. This is quite counter-intuitive, since it should find a batch size that is small enough (I can do it manually).

So: what does auto_find_batch_size do, exactly?

答案1

得分: 2

auto_find_batch_size参数是一个可选参数，可额外用于per_device_train_batch_size参数。

正如您所指出的，降低批量大小是解决内存不足错误的一种方法。auto_find_batch_size参数自动化了降低过程。启用此参数将使用accelerate中的find_executable_batch_size，它：

> 采用指数衰减，每次运行失败后将批量大小减半。

per_device_train_batch_size用作初始批量大小。因此，如果您使用默认值8，它将以批量大小8（在单个设备上）开始训练，如果失败，将以批量大小4重新启动训练过程。

英文:

The auto_find_batch_size argument is an optional argument which can be used in addition to the per_device_train_batch_size argument.

As you point out, lowering the batch size is one way to resolve out-of-memory errors. The auto_find_batch_size argument automates the lowering process. Enabling this, will use find_executable_batch_size from accelerate, which:

> operates with exponential decay, decreasing the batch size in half after each failed run

The per_device_train_batch_size is used as the initial batch size to start off with. So if you use the default of 8, it starts training with a batch size of 8 (on a single device), & if it fails, it will restart the training procedure with a batch size of 4.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Hugging Face Transformers 训练器：per_device_train_batch_size 与 auto_find_batch_size

问题

答案1

可以使用现有的大型语言模型（LLM）如ChatGPT来构建文本分类器吗？

Unpredictable multithreading behavior using HuggingFace and FastAPI with Uvicorn workers

你想要对transformers Python库中的mBART-50进行微调，以便它学习新单词。

应用文本分类于社交媒体评论。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论