英文:
How to continue training with HuggingFace Trainer?
问题
To continue training a model with HuggingFace's Seq2SeqTrainer, you can set the max_steps
parameter in the Seq2SeqTrainingArguments
to the total number of steps you want for the entire training session, including both the initial and continuation steps. Here's how you can do it:
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
# Assuming you've defined your training_args and trainer for the initial training
# Continue training with more steps (e.g., max_steps=160)
training_args.continuing_training = True
training_args.max_steps = 176 # Set it to the total number of steps you want (16 + 160)
# Instantiate the trainer again for continuation
trainer = Seq2SeqTrainer(
model=multibert,
tokenizer=tokenizer,
args=training_args,
train_dataset=train_data,
eval_dataset=val_data,
)
# Continue training for additional steps
trainer.train()
Make sure to adjust the max_steps
parameter to the total number of steps you want for both the initial training and continuation.
英文:
When training a model with Huggingface Trainer object, e.g. from https://www.kaggle.com/code/alvations/neural-plasticity-bert2bert-on-wmt14
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
import os
os.environ["WANDB_DISABLED"] = "true"
batch_size = 2
# set training arguments - these params are not really tuned, feel free to change
training_args = Seq2SeqTrainingArguments(
output_dir="./",
evaluation_strategy="steps",
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
predict_with_generate=True,
logging_steps=2, # set to 1000 for full training
save_steps=16, # set to 500 for full training
eval_steps=4, # set to 8000 for full training
warmup_steps=1, # set to 2000 for full training
max_steps=16, # delete for full training
# overwrite_output_dir=True,
save_total_limit=1,
#fp16=True,
)
# instantiate trainer
trainer = Seq2SeqTrainer(
model=multibert,
tokenizer=tokenizer,
args=training_args,
train_dataset=train_data,
eval_dataset=val_data,
)
trainer.train()
When it finished training, it outputs:
TrainOutput(global_step=16, training_loss=10.065429925918579, metrics={'train_runtime': 541.4209, 'train_samples_per_second': 0.059, 'train_steps_per_second': 0.03, 'total_flos': 19637939109888.0, 'train_loss': 10.065429925918579, 'epoch': 0.03})
If we want to continue training with more steps, e.g. max_steps=16
(from previous trainer.train()
run) and another max_steps=160
, do we do something like this?
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
import os
os.environ["WANDB_DISABLED"] = "true"
batch_size = 2
# set training arguments - these params are not really tuned, feel free to change
training_args = Seq2SeqTrainingArguments(
output_dir="./",
evaluation_strategy="steps",
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
predict_with_generate=True,
logging_steps=2, # set to 1000 for full training
save_steps=16, # set to 500 for full training
eval_steps=4, # set to 8000 for full training
warmup_steps=1, # set to 2000 for full training
max_steps=16, # delete for full training
# overwrite_output_dir=True,
save_total_limit=1,
#fp16=True,
)
# instantiate trainer
trainer = Seq2SeqTrainer(
model=multibert,
tokenizer=tokenizer,
args=training_args,
train_dataset=train_data,
eval_dataset=val_data,
)
# First 16 steps.
trainer.train()
# set training arguments - these params are not really tuned, feel free to change
training_args_2 = Seq2SeqTrainingArguments(
output_dir="./",
evaluation_strategy="steps",
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
predict_with_generate=True,
logging_steps=2, # set to 1000 for full training
save_steps=16, # set to 500 for full training
eval_steps=4, # set to 8000 for full training
warmup_steps=1, # set to 2000 for full training
max_steps=160, # delete for full training
# overwrite_output_dir=True,
save_total_limit=1,
#fp16=True,
)
# instantiate trainer
trainer = Seq2SeqTrainer(
model=multibert,
tokenizer=tokenizer,
args=training_args_2,
train_dataset=train_data,
eval_dataset=val_data,
)
# Continue training for 160 steps
trainer.train()
If the above is not the canonical way to continue training a model, how to continue training with HuggingFace Trainer?
Edited
With transformers version, 4.29.1
, trying @maciej-skorski answer with Seq2SeqTrainer
,
trainer = Seq2SeqTrainer(
model=multibert,
tokenizer=tokenizer,
args=training_args,
train_dataset=train_data,
eval_dataset=val_data,
resume_from_checkpoint=True
)
Its throwing an error:
TypeError: Seq2SeqTrainer.__init__() got an unexpected keyword argument 'resume_from_checkpoint'
答案1
得分: 2
如果您的用例涉及调整一个已经训练过的模型,那么可以采用与微调相同的方式解决。为此,您需要将当前模型状态和新的参数配置传递给PyTorch API中的Trainer
对象。我会说,这是标准做法
您提出的代码与huggingface文档中的通用微调模式相匹配:
trainer = Trainer(
model,
tokenizer=tokenizer,
training_args,
train_dataset=...,
eval_dataset=...,
)
您还可以从现有的检查点恢复训练:
trainer.train(resume_from_checkpoint=True)
英文:
If your use-case is about adjusting a somewhat-trained model then it can be solved just the same way as fine-tuning. To this end, you pass the current model state along with a new parameter config to the Trainer
object in PyTorch API. I would say, this is canonical
The code you proposed matches the general fine-tuning pattern from huggingface docs
trainer = Trainer(
model,
tokenizer=tokenizer,
training_args,
train_dataset=...,
eval_dataset=...,
)
You may also resume training from existing checkpoints
trainer.train(resume_from_checkpoint=True)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论