如何使用HuggingFace Trainer继续训练?

huangapple go评论68阅读模式
英文:

How to continue training with HuggingFace Trainer?

问题

To continue training a model with HuggingFace's Seq2SeqTrainer, you can set the max_steps parameter in the Seq2SeqTrainingArguments to the total number of steps you want for the entire training session, including both the initial and continuation steps. Here's how you can do it:

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

# Assuming you've defined your training_args and trainer for the initial training

# Continue training with more steps (e.g., max_steps=160)
training_args.continuing_training = True
training_args.max_steps = 176  # Set it to the total number of steps you want (16 + 160)

# Instantiate the trainer again for continuation
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Continue training for additional steps
trainer.train()

Make sure to adjust the max_steps parameter to the total number of steps you want for both the initial training and continuation.

英文:

When training a model with Huggingface Trainer object, e.g. from https://www.kaggle.com/code/alvations/neural-plasticity-bert2bert-on-wmt14

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

import os
os.environ["WANDB_DISABLED"] = "true"

batch_size = 2

# set training arguments - these params are not really tuned, feel free to change
training_args = Seq2SeqTrainingArguments(
    output_dir="./",
    evaluation_strategy="steps",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    logging_steps=2,  # set to 1000 for full training
    save_steps=16,    # set to 500 for full training
    eval_steps=4,     # set to 8000 for full training
    warmup_steps=1,   # set to 2000 for full training
    max_steps=16,     # delete for full training
    # overwrite_output_dir=True,
    save_total_limit=1,
    #fp16=True, 
)


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

trainer.train()

When it finished training, it outputs:

TrainOutput(global_step=16, training_loss=10.065429925918579, metrics={'train_runtime': 541.4209, 'train_samples_per_second': 0.059, 'train_steps_per_second': 0.03, 'total_flos': 19637939109888.0, 'train_loss': 10.065429925918579, 'epoch': 0.03})

If we want to continue training with more steps, e.g. max_steps=16 (from previous trainer.train() run) and another max_steps=160, do we do something like this?

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

import os
os.environ["WANDB_DISABLED"] = "true"

batch_size = 2

# set training arguments - these params are not really tuned, feel free to change
training_args = Seq2SeqTrainingArguments(
    output_dir="./",
    evaluation_strategy="steps",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    logging_steps=2,  # set to 1000 for full training
    save_steps=16,    # set to 500 for full training
    eval_steps=4,     # set to 8000 for full training
    warmup_steps=1,   # set to 2000 for full training
    max_steps=16,     # delete for full training
    # overwrite_output_dir=True,
    save_total_limit=1,
    #fp16=True, 
)


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# First 16 steps.
trainer.train()


# set training arguments - these params are not really tuned, feel free to change
training_args_2 = Seq2SeqTrainingArguments(
    output_dir="./",
    evaluation_strategy="steps",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    logging_steps=2,  # set to 1000 for full training
    save_steps=16,    # set to 500 for full training
    eval_steps=4,     # set to 8000 for full training
    warmup_steps=1,   # set to 2000 for full training
    max_steps=160,     # delete for full training
    # overwrite_output_dir=True,
    save_total_limit=1,
    #fp16=True, 
)


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args_2,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Continue training for 160 steps
trainer.train()

If the above is not the canonical way to continue training a model, how to continue training with HuggingFace Trainer?


Edited

With transformers version, 4.29.1, trying @maciej-skorski answer with Seq2SeqTrainer,

trainer = Seq2SeqTrainer(
    model=multibert,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
    resume_from_checkpoint=True
)

Its throwing an error:

TypeError: Seq2SeqTrainer.__init__() got an unexpected keyword argument 'resume_from_checkpoint'

答案1

得分: 2

如果您的用例涉及调整一个已经训练过的模型,那么可以采用与微调相同的方式解决。为此,您需要将当前模型状态和新的参数配置传递给PyTorch API中的Trainer对象。我会说,这是标准做法 如何使用HuggingFace Trainer继续训练?

您提出的代码与huggingface文档中的通用微调模式相匹配:

trainer = Trainer(
    model,
    tokenizer=tokenizer,
    training_args,
    train_dataset=...,
    eval_dataset=...,
)

您还可以从现有的检查点恢复训练

trainer.train(resume_from_checkpoint=True)
英文:

If your use-case is about adjusting a somewhat-trained model then it can be solved just the same way as fine-tuning. To this end, you pass the current model state along with a new parameter config to the Trainer object in PyTorch API. I would say, this is canonical 如何使用HuggingFace Trainer继续训练?

The code you proposed matches the general fine-tuning pattern from huggingface docs

trainer = Trainer(
    model,
    tokenizer=tokenizer,
    training_args,
    train_dataset=...,
    eval_dataset=...,
)

You may also resume training from existing checkpoints

trainer.train(resume_from_checkpoint=True)

huangapple
  • 本文由 发表于 2023年5月10日 19:20:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217781.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定