2023年6月13日 01:34:46go评论275阅读模式

英文:

How to load a fine-tuned peft/lora model based on llama with Huggingface transformers?

问题

我已经按你的要求翻译了以下内容：

尝试加载本地保存的模型

model = AutoModelForCausalLM.from_pretrained("finetuned_model")

结果为 Killed。

尝试从hub加载模型：

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "lucas0/empath-llama-7b"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(cwd+"/tokenizer.model")

# 加载Lora模型
model = PeftModel.from_pretrained(model, peft_model_id)

结果为：

AttributeError: /home/ubuntu/empath/lora/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

完整的堆栈跟踪

模型创建：

我使用PEFT和LoRa进行了微调：

model = AutoModelForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    torch_dtype=torch.float16,
    device_map='auto',
)

我不得不下载并手动指定llama的分词器。

tokenizer = LlamaTokenizer(cwd+"/tokenizer.model")
tokenizer.pad_token = tokenizer.eos_token

至于训练：

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

data = pd.read_csv("my_csv.csv")
dataset = Dataset.from_pandas(data)
tokenized_dataset = dataset.map(lambda samples: tokenizer(samples["text"]))

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_dataset,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = True  # 消除警告，请在推断时重新启用！
trainer.train()

并且本地保存了模型：

trainer.save_model(cwd+"/finetuned_model")
print("saved trainer locally")

以及推送到hub：

model.push_to_hub("lucas0/empath-llama-7b", create_pr=1)

如何加载我微调的模型？

英文:

I've followed this tutorial (colab notebook) in order to finetune my model.

Trying to load my locally saved model

model = AutoModelForCausalLM.from_pretrained(&quot;finetuned_model&quot;)

yields Killed.

Trying to load model from hub:

yields

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = &quot;lucas0/empath-llama-7b&quot;
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map=&#39;auto&#39;)
tokenizer = AutoTokenizer.from_pretrained(cwd+&quot;/tokenizer.model&quot;)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

yields

AttributeError: /home/ubuntu/empath/lora/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

full stacktrace

Model Creation:

I have finetuned a model using PEFT and LoRa:

model = AutoModelForCausalLM.from_pretrained(
&quot;decapoda-research/llama-7b-hf&quot;,
torch_dtype=torch.float16,
device_map=&#39;auto&#39;,
)

I had to download and manually specify the llama tokenizer.

tokenizer = LlamaTokenizer(cwd+&quot;/tokenizer.model&quot;)
tokenizer.pad_token = tokenizer.eos_token

to the training:

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=[&quot;q_proj&quot;, &quot;k_proj&quot;, &quot;v_proj&quot;, &quot;o_proj&quot;],
    lora_dropout=0.05,
    bias=&quot;none&quot;,
    task_type=&quot;CAUSAL_LM&quot;
)

model = get_peft_model(model, config)

data = pd.read_csv(&quot;my_csv.csv&quot;)
dataset = Dataset.from_pandas(data)
tokenized_dataset = dataset.map(lambda samples: tokenizer(samples[&quot;text&quot;]))

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_dataset,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir=&#39;outputs&#39;,
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = True  # silence the warnings. Please re-enable for inference!
trainer.train()

and saved it locally with:

trainer.save_model(cwd+&quot;/finetuned_model&quot;)
print(&quot;saved trainer locally&quot;)

as well as to the hub:

model.push_to_hub(&quot;lucas0/empath-llama-7b&quot;, create_pr=1)

How can I load my finetuned model?

答案1

得分: 3

要加载经过微调的 peft/lora 模型，请查看 guanco 示例，https://stackoverflow.com/a/76372390/610569

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

model_name = "decapoda-research/llama-7b-hf"
adapters_name = "lucas0/empath-llama-7b"

print(f"开始加载模型 {model_name} 到内存中")

m = AutoModelForCausalLM.from_pretrained(
    model_name,
    #load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1

stop_token_ids = [0]

print(f"成功加载模型 {model_name} 到内存中")

为了正确加载模型，您至少需要一个 A10g GPU 运行时。

有关更多详细信息，请参阅：

https://github.com/artidoro/qlora#tutorials-and-demonstrations
推理笔记本： https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing
训练笔记本： https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing

英文:

To load a fine-tuned peft/lora model, take a look at the guanco example, https://stackoverflow.com/a/76372390/610569

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

model_name = &quot;decapoda-research/llama-7b-hf&quot;
adapters_name = &quot;lucas0/empath-llama-7b&quot;

print(f&quot;Starting to load the model {model_name} into memory&quot;)

m = AutoModelForCausalLM.from_pretrained(
    model_name,
    #load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map={&quot;&quot;: 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1

stop_token_ids = [0]

print(f&quot;Successfully loaded the model {model_name} into memory&quot;)

You will need an A10g GPU runtime minimally to load the model properly.

For more details see

https://github.com/artidoro/qlora#tutorials-and-demonstrations
Inference notebook: https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing
Training notebook: https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing

答案2

得分: 0

你可以在推送后像这样加载。我成功地使用以下代码片段完成了这个操作：

# 安装所需的库
# pip install peft transformers

import torch
from peft import PeftModel, PeftConfig
from transformers import LlamaTokenizer, LlamaForCausalLM
from accelerate import infer_auto_device_map, init_empty_weights

peft_model_id = "--path--"  # 替换为你的模型路径

# 从预训练模型配置中加载配置
config = PeftConfig.from_pretrained(peft_model_id)

# 使用LlamaForCausalLM加载预训练模型
model1 = LlamaForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype='auto',
    device_map='auto',
    offload_folder="offload",
    offload_state_dict=True
)

# 使用LlamaTokenizer加载tokenizer
tokenizer = LlamaTokenizer.from_pretrained(config.base_model_name_or_path)

# 加载Lora模型
model1 = PeftModel.from_pretrained(model1, peft_model_id)

注意：请将peft_model_id替换为你实际的模型路径。

英文:

You can load like this after pushing. I did using the following snippet successfully .

# pip install peft transformers
import torch
from peft import PeftModel, PeftConfig
from transformers import LlamaTokenizer, LlamaForCausalLM
from accelerate import infer_auto_device_map, init_empty_weights

peft_model_id = &quot;--path--&quot;

config = PeftConfig.from_pretrained(peft_model_id)

model1 = LlamaForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=&#39;auto&#39;,
    device_map=&#39;auto&#39;,
    offload_folder=&quot;offload&quot;, offload_state_dict = True
)
tokenizer = LlamaTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model1 = PeftModel.from_pretrained(model, peft_model_id)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Huggingface transformers加载基于llama的fine-tuned peft/lora模型？

问题

尝试加载本地保存的模型

尝试从hub加载模型：

模型创建：

Trying to load my locally saved model

Trying to load model from hub:

Model Creation:

答案1

答案2

如何在Python中对列表进行标记化而不生成额外的空格和逗号。

How to update RDF graph by instantiating the variables existing in triples with values?

如何在%%cython中指定-march=native

为什么`os.CLD_CONTINUED`的Python文档字符串与`int().doc`相同？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论