问题

我有一个大小为4107的训练数据集。

批大小为8，训练周期数为2。

当我开始训练时，我可以看到步数为128。

英文:

I have a train dataset of size 4107.

DatasetDict({
    train: Dataset({
        features: [&#39;input_ids&#39;],
        num_rows: 4107
    })
    valid: Dataset({
        features: [&#39;input_ids&#39;],
        num_rows: 498
    })
})

In my training arguments, the batch size is 8 and number of epochs is 2.

from transformers import Trainer, TrainingArguments

args = TrainingArguments(
    output_dir=&quot;code_gen_epoch&quot;,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy=&quot;epoch&quot;,
    save_strategy=&quot;epoch&quot;,
    eval_steps=100,
    logging_steps=100,
    gradient_accumulation_steps=8,
    num_train_epochs=2,
    weight_decay=0.1,
    warmup_steps=1_000,
    lr_scheduler_type=&quot;cosine&quot;,
    learning_rate=3.0e-4,
    # save_steps=200,
    # fp16=True,
    load_best_model_at_end = True,
)

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets[&quot;train&quot;],
    eval_dataset=tokenized_datasets[&quot;valid&quot;],
)

When I start the training, I can see that the number of steps is 128.

My assumption is that the steps should have been 4107/8 = 512(approx) for 1 epoch.
For 2 epochs 512+512 = 1024.

I don't understand how it came to be 128.

答案1

得分: 5

由于您指定了 gradient_accumulation_steps=8，有效步数也会除以 8。这是因为您不是在每个批次上进行反向传播，而是在一定数量的累积批次上进行反向传播。

因此，一个时期内的结果步数将为：4107 个实例 ÷ 8 批次大小 ÷ 8 梯度累积 ≈ 128 步。当禁用梯度累积（gradient_accumulation_steps=1）时，您会得到 512 步（4107 ÷ 8 ÷ 1 ≈ 512）。

英文:

Since you're specifying gradient_accumulation_steps=8, the effective number of steps is is also divided by 8. This is because you're not doing a backward pass on every batch, but on a certain number of accumulated batches.

Hence, the resulting number of steps in an epoch would be: 4107 instances ÷ 8 batch size ÷ 8 gradient accumulation ≈ 128 steps. When gradient accumulation is disabled (gradient_accumulation_steps=1) you get 512 steps (4107 ÷ 8 ÷ 1 ≈ 512).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

HuggingFace trainer 中的步数是如何计算的？

问题

答案1

如何在Jenkins日志中对测试用例失败进行分类。

对象检测/分割是否会提高分类准确性？

将HF模型推送到Hub。

置信区间为二进制数据大于1的原因是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论