json2token 在使用 Huggingface transformers 中的 Donut VisionEncoderDecoderModel 时未找到。

huangapple go评论111阅读模式
英文:

json2token not found when using the Donut VisionEncoderDecoderModel from Huggingface transformers

问题

I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):

  1. !pip install transformers datasets sentencepiece donut-python
  2. from google.colab import drive
  3. from donut.util import DonutDataset
  4. from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
  5. drive.mount('/content/drive/')
  6. projectdir = 'drive/MyDrive/donut'
  7. donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2'
  8. config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
  9. config.decoder.max_length = 768
  10. processor = DonutProcessor.from_pretrained(donut_version)
  11. model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
  12. train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
  13. model,
  14. max_length=config.decoder.max_length,
  15. split="train",
  16. task_start_token="",
  17. prompt_end_token="",
  18. sort_json_key=True,
  19. )

However, the last line is throwing the following error:

  1. ---------------------------------------------------------------------------
  2. AttributeError Traceback (most recent call last)
  3. <ipython-input-8-9d831be996e6> in <cell line: 4>()
  4. 2
  5. 3 max_length = 768
  6. ----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
  7. 5 model,
  8. 6 max_length=config.decoder.max_length,
  9. AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'

I'm a little confused because my model object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???

What am I missing?

By the way, you can view/copy my underlying data (images and json-lines metadata file) from my Google Drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing

英文:

I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):

  1. !pip install transformers datasets sentencepiece donut-python
  2. from google.colab import drive
  3. from donut.util import DonutDataset
  4. from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
  5. drive.mount(&#39;/content/drive/&#39;)
  6. projectdir = &#39;drive/MyDrive/donut&#39;
  7. donut_version = &#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39; # &#39;naver-clova-ix/donut-base&#39;
  8. config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
  9. config.decoder.max_length = 768
  10. processor = DonutProcessor.from_pretrained(donut_version)
  11. model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
  12. train_dataset = DonutDataset(f&#39;{projectdir}/input_doc_images&#39;,
  13. model,
  14. #&#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;,
  15. max_length=config.decoder.max_length,
  16. split=&quot;train&quot;,
  17. task_start_token=&quot;&quot;,
  18. prompt_end_token=&quot;&quot;,
  19. sort_json_key=True,
  20. )

...however, the last line is throwing the following error:

  1. ---------------------------------------------------------------------------
  2. AttributeError Traceback (most recent call last)
  3. &lt;ipython-input-8-9d831be996e6&gt; in &lt;cell line: 4&gt;()
  4. 2
  5. 3 max_length = 768
  6. ----&gt; 4 train_dataset = DonutDataset(f&#39;{projectdir}/input_doc_images&#39;,
  7. 5 model,
  8. 6 #&#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39;,
  9. 2 frames
  10. /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
  11. 1612 if name in modules:
  12. 1613 return modules[name]
  13. -&gt; 1614 raise AttributeError(&quot;&#39;{}&#39; object has no attribute &#39;{}&#39;&quot;.format(
  14. 1615 type(self).__name__, name))
  15. 1616
  16. AttributeError: &#39;VisionEncoderDecoderModel&#39; object has no attribute &#39;json2token&#39;

I'm a little confused because my model object is a &#39;naver-clova-ix/donut-base-finetuned-cord-v2&#39; model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???

What am I missing?

btw, you can view/copy my underlying data (images and json-lines metdata file) from my google drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing

答案1

得分: 1

To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.

  1. from donut.util import DonutDataset
  2. from donut import DonutModel
  3. from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
  4. import torch
  5. pretrained_model = DonutModel.from_pretrained(
  6. "naver-clova-ix/donut-base-finetuned-rvlcdip",
  7. ignore_mismatched_sizes=True)
  8. pretrained_model.encoder.to(torch.bfloat16)
  9. train_dataset = DonutDataset('my_dataset/',
  10. pretrained_model,
  11. max_length=config.decoder.max_length,
  12. split="train",
  13. task_start_token="",
  14. prompt_end_token="",
  15. sort_json_key=True,
  16. )

Note that the json2token function is in the donut repo here for the DonutModel object.

And if we look at the transformers, there's no json2token here for the VisionEncoderDecoderModel object.

To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset to re-create the dataset into a Huggingface-friendly dataset, like this:

  1. import PIL.Image
  2. from datasets import Dataset
  3. i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
  4. i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')
  5. train_dataset = Dataset.from_dict({'images': [i1, i2]})

Then you have to do all the feature processing by yourself before feeding it to the model, see here.

英文:

To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.

  1. from donut.util import DonutDataset
  2. from donut import DonutModel
  3. from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
  4. import torch
  5. pretrained_model = DonutModel.from_pretrained(
  6. &quot;naver-clova-ix/donut-base-finetuned-rvlcdip&quot;,
  7. ignore_mismatched_sizes=True)
  8. pretrained_model.encoder.to(torch.bfloat16)
  9. train_dataset = DonutDataset(f&#39;my_dataset/&#39;,
  10. pretrained_model,
  11. max_length=config.decoder.max_length,
  12. split=&quot;train&quot;,
  13. task_start_token=&quot;&quot;,
  14. prompt_end_token=&quot;&quot;,
  15. sort_json_key=True,
  16. )

Note that the json2token function is in the donut repo https://github.com/clovaai/donut/blob/master/donut/model.py#L498 for the DonutModel object.

And if we look at the transformers, there's no json2token https://github.com/huggingface/transformers/blob/main/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py#L151 for the VisionEncoderDecoderModel object.


To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset re-create the dataset into Huggingface friendly dataset, like this:

  1. import PIL.Image
  2. from datasets import Dataset
  3. i1 = PIL.Image.open(&#39;my_dataset/alex_cannon_dep_first_page.png&#39;)
  4. i2 = PIL.Image.open(&#39;my_dataset/mcentee_dep_first_page.png&#39;)
  5. train_dataset = Dataset.from_dict({&#39;images&#39;: [i1, i2]})

Then you have to do all the features processing by yourself before feeding it to the model, see https://huggingface.co/docs/transformers/model_doc/donut

huangapple
  • 本文由 发表于 2023年6月9日 11:58:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76437117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定