如何使用Huggingface GenerationMixin(或其束搜索)与我的自定义模型?

huangapple go评论154阅读模式
英文:

How to use Huggingface GenerationMixin (or its beam search) with my own model?

问题

Huggingface的使用mixin让我一直觉得这应该是可能的,但我找不到任何明确说明要求是什么,或者依赖关系是否太多以至于不值得。中央模块实际上有成千上万行代码,昨天我从研究它中感受到,我对如何编写波束搜索了解更多,而不是了解GenerationMixin。 如何使用Huggingface GenerationMixin(或其束搜索)与我的自定义模型?

从阅读源代码来看,我认为依赖关系是self.config,然后是prepare_inputs_for_generation()_update_model_kwargs_for_generation(),还有隐含的forward()。但我不确定这是否就是全部。也不清楚每个函数应该是什么样子的。而且我认为它可能期望forward()以特定格式返回数据。

为了使讨论具体化,并且普遍有用,Huggingface的波束搜索如何与minGPT一起使用呢?minGPT有一个返回logits,lossforward()函数。(它实际上有自己的generate()函数,相当于Huggingface的sample()greedy_search(),但不支持波束搜索。) 或者,如果你更喜欢,也可以是nanoGPT - 在这个领域它们是相同的。

在评论中,我说过似乎每个人的生成/波束搜索实现都与他们的变压器实现紧密相连... 我仍然无法真正看出为什么每个人都要重新发明这个轮子,以及为什么没有一个独立的开源波束搜索实现,带有明确定义的接口。我将在这个问题上投放一些悬赏金,看看是否有所帮助。

英文:

Huggingface's use of a mixin keeps teasing me that this should be possible, but I can't find any clear documentation on exactly what the requirements are, or if the dependencies are just too much to make it worth it. The central module is literally thousands and thousands of lines, and I felt from studying it yesterday that I've learnt more about how to write beam search than I have about GenerationMixin. 如何使用Huggingface GenerationMixin(或其束搜索)与我的自定义模型?

From reading the source I think the dependencies are self.config then prepare_inputs_for_generation() and _update_model_kwargs_for_generation(); also implicitly forward(). But I'm not sure that is everything. Nor what each should look like. And I think it may expect forward() to return data in a specific format.

To make the discussion specific, and generally useful, how could Huggingface's beam search be used with minGPT, which has a forward() function that returns logits,loss. (It actually has its own generate() function that does the equivalent of Huggingface's sample() and greedy_search(), but no beam search support.) Or nanoGPT if you prefer - they are identical in this area.

In the comments I said It seems everyone's generate/beam search implementation is tied in closely with their transformer implementation... and I still can't really see why everyone reinvents this wheel, and why there is no standalone open source beam search implementation, with a clearly defined interface. Going to throw a bounty at this question, to see if it helps.

答案1

得分: 1

如果你想使用huggingface的代码,你需要的是GenerationMixin类中的generate函数,详见这里

所以你的选择要么是修改代码以继承GenerationMixin类,要么是将代码复制过去。无论哪种方式,都要根据你的模型是否适用于huggingface,因此不能随意插入一个没有调整代码的随机模型。

如果你不一定要使用huggingface的代码,GitHub上有很多方便的实现,更容易进行调整,例如这里

英文:

If you want to use huggingface code, what you're looking for is generate from GenerationMixin class, see here

So your options are either adapt the code to inherit from GenerationMixin, or copy the code over. Either way it depends on your model being huggingface-friendly so juts plugging in a random one without adjusting the code won't work.

If you don't necessarily want to use hface code, there's a bunch of very convenient implementations on github that are easier to adapt, for example here

huangapple
  • 本文由 发表于 2023年3月9日 16:54:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682297.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定