问题

Huggingface的使用mixin让我一直觉得这应该是可能的，但我找不到任何明确说明要求是什么，或者依赖关系是否太多以至于不值得。中央模块实际上有成千上万行代码，昨天我从研究它中感受到，我对如何编写波束搜索了解更多，而不是了解GenerationMixin。

从阅读源代码来看，我认为依赖关系是self.config，然后是prepare_inputs_for_generation()和_update_model_kwargs_for_generation()，还有隐含的forward()。但我不确定这是否就是全部。也不清楚每个函数应该是什么样子的。而且我认为它可能期望forward()以特定格式返回数据。

为了使讨论具体化，并且普遍有用，Huggingface的波束搜索如何与minGPT一起使用呢？minGPT有一个返回logits,loss的forward()函数。（它实际上有自己的generate()函数，相当于Huggingface的sample()和greedy_search()，但不支持波束搜索。）或者，如果你更喜欢，也可以是nanoGPT - 在这个领域它们是相同的。

在评论中，我说过似乎每个人的生成/波束搜索实现都与他们的变压器实现紧密相连... 我仍然无法真正看出为什么每个人都要重新发明这个轮子，以及为什么没有一个独立的开源波束搜索实现，带有明确定义的接口。我将在这个问题上投放一些悬赏金，看看是否有所帮助。

英文:

Huggingface's use of a mixin keeps teasing me that this should be possible, but I can't find any clear documentation on exactly what the requirements are, or if the dependencies are just too much to make it worth it. The central module is literally thousands and thousands of lines, and I felt from studying it yesterday that I've learnt more about how to write beam search than I have about GenerationMixin.

From reading the source I think the dependencies are self.config then prepare_inputs_for_generation() and _update_model_kwargs_for_generation(); also implicitly forward(). But I'm not sure that is everything. Nor what each should look like. And I think it may expect forward() to return data in a specific format.

To make the discussion specific, and generally useful, how could Huggingface's beam search be used with minGPT, which has a forward() function that returns logits,loss. (It actually has its own generate() function that does the equivalent of Huggingface's sample() and greedy_search(), but no beam search support.) Or nanoGPT if you prefer - they are identical in this area.

In the comments I said It seems everyone's generate/beam search implementation is tied in closely with their transformer implementation... and I still can't really see why everyone reinvents this wheel, and why there is no standalone open source beam search implementation, with a clearly defined interface. Going to throw a bounty at this question, to see if it helps.

答案1

得分: 1

如果你想使用huggingface的代码，你需要的是GenerationMixin类中的generate函数，详见这里。

所以你的选择要么是修改代码以继承GenerationMixin类，要么是将代码复制过去。无论哪种方式，都要根据你的模型是否适用于huggingface，因此不能随意插入一个没有调整代码的随机模型。

如果你不一定要使用huggingface的代码，GitHub上有很多方便的实现，更容易进行调整，例如这里。

英文:

If you want to use huggingface code, what you're looking for is generate from GenerationMixin class, see here

So your options are either adapt the code to inherit from GenerationMixin, or copy the code over. Either way it depends on your model being huggingface-friendly so juts plugging in a random one without adjusting the code won't work.

If you don't necessarily want to use hface code, there's a bunch of very convenient implementations on github that are easier to adapt, for example here

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Huggingface GenerationMixin（或其束搜索）与我的自定义模型？

问题

答案1

Pytorch DataLoader for custom dataset to load data image and mask correctly with a window size

使用CrossEntropyLoss损失函数的模型不应用softmax（在PyTorch中）。

Pytorch-Scarf package RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

RuntimeError: 在PyTorch代码中，预期标量类型为Double，但找到了Float。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论