json2token 在使用 Huggingface transformers 中的 Donut VisionEncoderDecoderModel 时未找到。

huangapple go评论73阅读模式
英文:

json2token not found when using the Donut VisionEncoderDecoderModel from Huggingface transformers

问题

I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):

!pip install transformers datasets sentencepiece donut-python

from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig

drive.mount('/content/drive/')
projectdir = 'drive/MyDrive/donut'

donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2'
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768

processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)

train_dataset = DonutDataset(f'{projectdir}/input_doc_images', 
                             model,
                             max_length=config.decoder.max_length,
                             split="train", 
                             task_start_token="",
                             prompt_end_token="",
                             sort_json_key=True,
                             )

However, the last line is throwing the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-9d831be996e6> in <cell line: 4>()
      2 
      3 max_length = 768
----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images', 
      5                              model,
      6                              max_length=config.decoder.max_length,

AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'

I'm a little confused because my model object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???

What am I missing?

By the way, you can view/copy my underlying data (images and json-lines metadata file) from my Google Drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing

英文:

I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):

!pip install transformers datasets sentencepiece donut-python

from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig

drive.mount(&#39;/content/drive/&#39;)
projectdir = &#39;drive/MyDrive/donut&#39;


donut_version = &#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;  # &#39;naver-clova-ix/donut-base&#39;
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768

processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)


train_dataset = DonutDataset(f&#39;{projectdir}/input_doc_images&#39;, 
                             model,
                             #&#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;,
                             max_length=config.decoder.max_length,
                             split=&quot;train&quot;, 
                             task_start_token=&quot;&quot;, 
                             prompt_end_token=&quot;&quot;,
                             sort_json_key=True,
                             )

...however, the last line is throwing the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
&lt;ipython-input-8-9d831be996e6&gt; in &lt;cell line: 4&gt;()
      2 
      3 max_length = 768
----&gt; 4 train_dataset = DonutDataset(f&#39;{projectdir}/input_doc_images&#39;, 
      5                              model,
      6                              #&#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;,

2 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
   1612             if name in modules:
   1613                 return modules[name]
-&gt; 1614         raise AttributeError(&quot;&#39;{}&#39; object has no attribute &#39;{}&#39;&quot;.format(
   1615             type(self).__name__, name))
   1616 

AttributeError: &#39;VisionEncoderDecoderModel&#39; object has no attribute &#39;json2token&#39;

I'm a little confused because my model object is a &#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39; model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???

What am I missing?

btw, you can view/copy my underlying data (images and json-lines metdata file) from my google drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing

答案1

得分: 1

To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.

from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig

import torch

pretrained_model = DonutModel.from_pretrained(
    "naver-clova-ix/donut-base-finetuned-rvlcdip",
    ignore_mismatched_sizes=True)

pretrained_model.encoder.to(torch.bfloat16)


train_dataset = DonutDataset('my_dataset/', 
                             pretrained_model,
                             max_length=config.decoder.max_length,
                             split="train", 
                             task_start_token="",
                             prompt_end_token="",
                             sort_json_key=True,
                             )

Note that the json2token function is in the donut repo here for the DonutModel object.

And if we look at the transformers, there's no json2token here for the VisionEncoderDecoderModel object.

To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset to re-create the dataset into a Huggingface-friendly dataset, like this:

import PIL.Image
from datasets import Dataset

i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')

train_dataset = Dataset.from_dict({'images': [i1, i2]})

Then you have to do all the feature processing by yourself before feeding it to the model, see here.

英文:

To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.

from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig

import torch

pretrained_model = DonutModel.from_pretrained(
    &quot;naver-clova-ix/donut-base-finetuned-rvlcdip&quot;,
    ignore_mismatched_sizes=True)

pretrained_model.encoder.to(torch.bfloat16)


train_dataset = DonutDataset(f&#39;my_dataset/&#39;, 
                             pretrained_model,
                             max_length=config.decoder.max_length,
                             split=&quot;train&quot;, 
                             task_start_token=&quot;&quot;, 
                             prompt_end_token=&quot;&quot;,
                             sort_json_key=True,
                             )

Note that the json2token function is in the donut repo https://github.com/clovaai/donut/blob/master/donut/model.py#L498 for the DonutModel object.

And if we look at the transformers, there's no json2token https://github.com/huggingface/transformers/blob/main/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py#L151 for the VisionEncoderDecoderModel object.


To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset re-create the dataset into Huggingface friendly dataset, like this:


import PIL.Image

from datasets import Dataset

i1 = PIL.Image.open(&#39;my_dataset/alex_cannon_dep_first_page.png&#39;)
i2 = PIL.Image.open(&#39;my_dataset/mcentee_dep_first_page.png&#39;)

train_dataset = Dataset.from_dict({&#39;images&#39;: [i1, i2]})

Then you have to do all the features processing by yourself before feeding it to the model, see https://huggingface.co/docs/transformers/model_doc/donut

huangapple
  • 本文由 发表于 2023年6月9日 11:58:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76437117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定