2023年5月10日 19:20:53go评论68阅读模式

英文:

How to continue training with HuggingFace Trainer?

问题

To continue training a model with HuggingFace's Seq2SeqTrainer, you can set the max_steps parameter in the Seq2SeqTrainingArguments to the total number of steps you want for the entire training session, including both the initial and continuation steps. Here's how you can do it:

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

# Assuming you've defined your training_args and trainer for the initial training

# Continue training with more steps (e.g., max_steps=160)
training_args.continuing_training = True
training_args.max_steps = 176  # Set it to the total number of steps you want (16 + 160)

# Instantiate the trainer again for continuation
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Continue training for additional steps
trainer.train()

Make sure to adjust the max_steps parameter to the total number of steps you want for both the initial training and continuation.

英文:

When training a model with Huggingface Trainer object, e.g. from https://www.kaggle.com/code/alvations/neural-plasticity-bert2bert-on-wmt14

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

import os
os.environ[&quot;WANDB_DISABLED&quot;] = &quot;true&quot;

batch_size = 2

# set training arguments - these params are not really tuned, feel free to change
training_args = Seq2SeqTrainingArguments(
    output_dir=&quot;./&quot;,
    evaluation_strategy=&quot;steps&quot;,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    logging_steps=2,  # set to 1000 for full training
    save_steps=16,    # set to 500 for full training
    eval_steps=4,     # set to 8000 for full training
    warmup_steps=1,   # set to 2000 for full training
    max_steps=16,     # delete for full training
    # overwrite_output_dir=True,
    save_total_limit=1,
    #fp16=True, 
)


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

trainer.train()

When it finished training, it outputs:

TrainOutput(global_step=16, training_loss=10.065429925918579, metrics={&#39;train_runtime&#39;: 541.4209, &#39;train_samples_per_second&#39;: 0.059, &#39;train_steps_per_second&#39;: 0.03, &#39;total_flos&#39;: 19637939109888.0, &#39;train_loss&#39;: 10.065429925918579, &#39;epoch&#39;: 0.03})

If we want to continue training with more steps, e.g. max_steps=16 (from previous trainer.train() run) and another max_steps=160, do we do something like this?

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

import os
os.environ[&quot;WANDB_DISABLED&quot;] = &quot;true&quot;

batch_size = 2

# set training arguments - these params are not really tuned, feel free to change
training_args = Seq2SeqTrainingArguments(
    output_dir=&quot;./&quot;,
    evaluation_strategy=&quot;steps&quot;,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    logging_steps=2,  # set to 1000 for full training
    save_steps=16,    # set to 500 for full training
    eval_steps=4,     # set to 8000 for full training
    warmup_steps=1,   # set to 2000 for full training
    max_steps=16,     # delete for full training
    # overwrite_output_dir=True,
    save_total_limit=1,
    #fp16=True, 
)


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# First 16 steps.
trainer.train()


# set training arguments - these params are not really tuned, feel free to change
training_args_2 = Seq2SeqTrainingArguments(
    output_dir=&quot;./&quot;,
    evaluation_strategy=&quot;steps&quot;,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    logging_steps=2,  # set to 1000 for full training
    save_steps=16,    # set to 500 for full training
    eval_steps=4,     # set to 8000 for full training
    warmup_steps=1,   # set to 2000 for full training
    max_steps=160,     # delete for full training
    # overwrite_output_dir=True,
    save_total_limit=1,
    #fp16=True, 
)


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args_2,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Continue training for 160 steps
trainer.train()

If the above is not the canonical way to continue training a model, how to continue training with HuggingFace Trainer?

Edited

With transformers version, 4.29.1, trying @maciej-skorski answer with Seq2SeqTrainer,

trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
    resume_from_checkpoint=True
)

Its throwing an error:

TypeError: Seq2SeqTrainer.__init__() got an unexpected keyword argument &#39;resume_from_checkpoint&#39;

答案1

得分: 2

如果您的用例涉及调整一个已经训练过的模型，那么可以采用与微调相同的方式解决。为此，您需要将当前模型状态和新的参数配置传递给PyTorch API中的Trainer对象。我会说，这是标准做法

您提出的代码与huggingface文档中的通用微调模式相匹配：

trainer = Trainer(
    model,
    tokenizer=tokenizer,
    training_args,
    train_dataset=...,
    eval_dataset=...,
)

您还可以从现有的检查点恢复训练：

trainer.train(resume_from_checkpoint=True)

英文:

If your use-case is about adjusting a somewhat-trained model then it can be solved just the same way as fine-tuning. To this end, you pass the current model state along with a new parameter config to the Trainer object in PyTorch API. I would say, this is canonical

The code you proposed matches the general fine-tuning pattern from huggingface docs

trainer = Trainer(
    model,
    tokenizer=tokenizer,
    training_args,
    train_dataset=...,
    eval_dataset=...,
)

You may also resume training from existing checkpoints

trainer.train(resume_from_checkpoint=True)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用HuggingFace Trainer继续训练？

问题

Edited

答案1

有没有办法使用Selenium让OAuth2的cookies永久有效。

从数据框单元格中删除特定元素时，只需将该元素从列表中删除。

Python解包列表以在格式化字符串中使用

字典由理解构成的条目顺序

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论