2023年6月9日 11:58:30go评论111阅读模式

英文:

json2token not found when using the Donut VisionEncoderDecoderModel from Huggingface transformers

问题

I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):

!pip install transformers datasets sentencepiece donut-python
from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
drive.mount('/content/drive/')
projectdir = 'drive/MyDrive/donut'
donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2'
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768
processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
train_dataset = DonutDataset(f'{projectdir}/input_doc_images', 
                             model,
                             max_length=config.decoder.max_length,
                             split="train", 
                             task_start_token="",
                             prompt_end_token="",
                             sort_json_key=True,
                             )

However, the last line is throwing the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-9d831be996e6> in <cell line: 4>()
      2 
      3 max_length = 768
----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images', 
      5                              model,
      6                              max_length=config.decoder.max_length,
AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'

I'm a little confused because my model object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???

What am I missing?

By the way, you can view/copy my underlying data (images and json-lines metadata file) from my Google Drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing

英文:

!pip install transformers datasets sentencepiece donut-python
from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
drive.mount(&#39;/content/drive/&#39;)
projectdir = &#39;drive/MyDrive/donut&#39;
donut_version = &#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;  # &#39;naver-clova-ix/donut-base&#39;
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768
processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
train_dataset = DonutDataset(f&#39;{projectdir}/input_doc_images&#39;, 
                             model,
                             #&#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;,
                             max_length=config.decoder.max_length,
                             split=&quot;train&quot;, 
                             task_start_token=&quot;&quot;, 
                             prompt_end_token=&quot;&quot;,
                             sort_json_key=True,
                             )

...however, the last line is throwing the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
&lt;ipython-input-8-9d831be996e6&gt; in &lt;cell line: 4&gt;()
      2 
      3 max_length = 768
----&gt; 4 train_dataset = DonutDataset(f&#39;{projectdir}/input_doc_images&#39;, 
      5                              model,
      6                              #&#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;,
2 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
   1612             if name in modules:
   1613                 return modules[name]
-&gt; 1614         raise AttributeError(&quot;&#39;{}&#39; object has no attribute &#39;{}&#39;&quot;.format(
   1615             type(self).__name__, name))
   1616 
AttributeError: &#39;VisionEncoderDecoderModel&#39; object has no attribute &#39;json2token&#39;

I'm a little confused because my model object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???

What am I missing?

btw, you can view/copy my underlying data (images and json-lines metdata file) from my google drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing

答案1

得分: 1

To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.

from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
import torch
pretrained_model = DonutModel.from_pretrained(
    "naver-clova-ix/donut-base-finetuned-rvlcdip",
    ignore_mismatched_sizes=True)
pretrained_model.encoder.to(torch.bfloat16)
train_dataset = DonutDataset('my_dataset/', 
                             pretrained_model,
                             max_length=config.decoder.max_length,
                             split="train", 
                             task_start_token="",
                             prompt_end_token="",
                             sort_json_key=True,
                             )

Note that the json2token function is in the donut repo here for the DonutModel object.

And if we look at the transformers, there's no json2token here for the VisionEncoderDecoderModel object.

To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset to re-create the dataset into a Huggingface-friendly dataset, like this:

import PIL.Image
from datasets import Dataset
i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')
train_dataset = Dataset.from_dict({'images': [i1, i2]})

Then you have to do all the feature processing by yourself before feeding it to the model, see here.

英文:

To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.

from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
import torch
pretrained_model = DonutModel.from_pretrained(
    &quot;naver-clova-ix/donut-base-finetuned-rvlcdip&quot;,
    ignore_mismatched_sizes=True)
pretrained_model.encoder.to(torch.bfloat16)
train_dataset = DonutDataset(f&#39;my_dataset/&#39;, 
                             pretrained_model,
                             max_length=config.decoder.max_length,
                             split=&quot;train&quot;, 
                             task_start_token=&quot;&quot;, 
                             prompt_end_token=&quot;&quot;,
                             sort_json_key=True,
                             )

Note that the json2token function is in the donut repo https://github.com/clovaai/donut/blob/master/donut/model.py#L498 for the DonutModel object.

And if we look at the transformers, there's no json2token https://github.com/huggingface/transformers/blob/main/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py#L151 for the VisionEncoderDecoderModel object.

To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset re-create the dataset into Huggingface friendly dataset, like this:


import PIL.Image
from datasets import Dataset
i1 = PIL.Image.open(&#39;my_dataset/alex_cannon_dep_first_page.png&#39;)
i2 = PIL.Image.open(&#39;my_dataset/mcentee_dep_first_page.png&#39;)
train_dataset = Dataset.from_dict({&#39;images&#39;: [i1, i2]})

Then you have to do all the features processing by yourself before feeding it to the model, see https://huggingface.co/docs/transformers/model_doc/donut

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

json2token 在使用 Huggingface transformers 中的 Donut VisionEncoderDecoderModel 时未找到。

问题

答案1

如何在多年内获取起始日期和结束日期的唯一周数 – Pandas

使用Python查找txt文件中最长的句子。

NumPy最近邻线拟合跨移动窗口

aiohttp.ClientSession()对象的.get()方法为何即使是异步，也会按顺序返回结果？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。