英文:
json2token not found when using the Donut VisionEncoderDecoderModel from Huggingface transformers
问题
I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):
!pip install transformers datasets sentencepiece donut-python
from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
drive.mount('/content/drive/')
projectdir = 'drive/MyDrive/donut'
donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2'
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768
processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
model,
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
However, the last line is throwing the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-9d831be996e6> in <cell line: 4>()
2
3 max_length = 768
----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
5 model,
6 max_length=config.decoder.max_length,
AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'
I'm a little confused because my model object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???
What am I missing?
By the way, you can view/copy my underlying data (images and json-lines metadata file) from my Google Drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing
英文:
I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset object. I have the following code (running in google colab):
!pip install transformers datasets sentencepiece donut-python
from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
drive.mount('/content/drive/')
projectdir = 'drive/MyDrive/donut'
donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2' # 'naver-clova-ix/donut-base'
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768
processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
model,
#'naver-clova-ix/donut-base-finetuned-cord-v2',
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
...however, the last line is throwing the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-9d831be996e6> in <cell line: 4>()
2
3 max_length = 768
----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
5 model,
6 #'naver-clova-ix/donut-base-finetuned-cord-v2',
2 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
1616
AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'
I'm a little confused because my model object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token method???
What am I missing?
btw, you can view/copy my underlying data (images and json-lines metdata file) from my google drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing
答案1
得分: 1
To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.
from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
import torch
pretrained_model = DonutModel.from_pretrained(
"naver-clova-ix/donut-base-finetuned-rvlcdip",
ignore_mismatched_sizes=True)
pretrained_model.encoder.to(torch.bfloat16)
train_dataset = DonutDataset('my_dataset/',
pretrained_model,
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
Note that the json2token function is in the donut repo here for the DonutModel object.
And if we look at the transformers, there's no json2token here for the VisionEncoderDecoderModel object.
To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset to re-create the dataset into a Huggingface-friendly dataset, like this:
import PIL.Image
from datasets import Dataset
i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')
train_dataset = Dataset.from_dict({'images': [i1, i2]})
Then you have to do all the feature processing by yourself before feeding it to the model, see here.
英文:
To use the DonutDataset correctly, you should use the model class from donut instead of transformers, and the json2token function would work correctly e.g.
from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
import torch
pretrained_model = DonutModel.from_pretrained(
"naver-clova-ix/donut-base-finetuned-rvlcdip",
ignore_mismatched_sizes=True)
pretrained_model.encoder.to(torch.bfloat16)
train_dataset = DonutDataset(f'my_dataset/',
pretrained_model,
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
Note that the json2token function is in the donut repo https://github.com/clovaai/donut/blob/master/donut/model.py#L498 for the DonutModel object.
And if we look at the transformers, there's no json2token https://github.com/huggingface/transformers/blob/main/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py#L151 for the VisionEncoderDecoderModel object.
To use the model from transformers instead of donut, you might need to read the data differently and not use donut.util.DonutDataset re-create the dataset into Huggingface friendly dataset, like this:
import PIL.Image
from datasets import Dataset
i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')
train_dataset = Dataset.from_dict({'images': [i1, i2]})
Then you have to do all the features processing by yourself before feeding it to the model, see https://huggingface.co/docs/transformers/model_doc/donut
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论