问题

我需要找出如何在开始时不初始化权重的情况下加载预训练的转换器模型（以节省时间和内存）？

saved_inits = torch.nn.init.kaiming_uniform_,
    torch.nn.init.uniform_,
    torch.nn.init.normal_  # 保留
torch.nn.init.kaiming_uniform_ = skip
torch.nn.init.uniform_ = skip
torch.nn.init.normal_ = skip

model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=args.model_path)

torch.nn.init.kaiming_uniform_,
    torch.nn.init.uniform_,
    torch.nn.init.normal_ = saved_inits  # 恢复

对于nn.module子类，有torch.nn.utils.skip_init，但它不适用于AutoModelForCausalLM。

Quest: 找到一种在AutoModelForCausalLM（或任何类似的变换器类）中跳过权重初始化的方法，可以使用一些标准包装器或参数。

英文:

I need to find out how to load a pretrained transformer model without initializing weights in the beginning (to save time and memory)?

I saw this code example, but this is not elegant:

saved_inits = torch.nn.init.kaiming_uniform_, 
    torch.nn.init.uniform_, 
    torch.nn.init.normal_  # preserving
torch.nn.init.kaiming_uniform_ = skip
torch.nn.init.uniform_ = skip
torch.nn.init.normal_ = skip

model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=args.model_path)

torch.nn.init.kaiming_uniform_, 
    torch.nn.init.uniform_, 
    torch.nn.init.normal_ = saved_inits  # restoring

for nn.module subclasses there is torch.nn.utils.skip_init, but it won't work with AutoModelForCausalLM

Quest: find a way to skip weights initialization in AutoModelForCausalLM (or any similar transformers class) either using some standard wrapper or parameter.

答案1

得分: 1

以下是翻译好的内容：
"答案在cronoik的评论中提到了："
"我使用Llama 30B进行了测试，发现加载时间加快了3倍，但内存使用没有增加。"

英文:

The answer was suggested in the comment by cronoik:

model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=args.model_path,
    low_cpu_mem_usage=True
    )

I tested it with Llama 30B and found 3x acceleration in loading time, though no gain in memory use.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在加载预训练的转换模型时跳过权重初始化？

问题

答案1

Python Shiny：如何使用两个按钮切换条件面板的可见性？

Apache在启动时超时，我无法从错误日志中找到任何帮助。

tkraise和Python GUI(tkinter)中的一些面向对象编程问题。

BERTopic模型：我应该移除名字吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论