英文:
json2token not found when using the Donut VisionEncoderDecoderModel from Huggingface transformers
问题
I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset
object. I have the following code (running in google colab):
!pip install transformers datasets sentencepiece donut-python
from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
drive.mount('/content/drive/')
projectdir = 'drive/MyDrive/donut'
donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2'
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768
processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
model,
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
However, the last line is throwing the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-9d831be996e6> in <cell line: 4>()
2
3 max_length = 768
----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
5 model,
6 max_length=config.decoder.max_length,
AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'
I'm a little confused because my model
object is a 'naver-clova-ix/donut-base-finetuned-cord-v2' model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token
method???
What am I missing?
By the way, you can view/copy my underlying data (images and json-lines metadata file) from my Google Drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing
英文:
I am trying to fine-tune a Donut (Document Understanding) Huggingface Transformer model, but am getting hung up trying to create a DonutDataset
object. I have the following code (running in google colab):
!pip install transformers datasets sentencepiece donut-python
from google.colab import drive
from donut.util import DonutDataset
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
drive.mount('/content/drive/')
projectdir = 'drive/MyDrive/donut'
donut_version = 'naver-clova-ix/donut-base-finetuned-cord-v2' # 'naver-clova-ix/donut-base'
config = VisionEncoderDecoderConfig.from_pretrained(donut_version)
config.decoder.max_length = 768
processor = DonutProcessor.from_pretrained(donut_version)
model = VisionEncoderDecoderModel.from_pretrained(donut_version, config=config)
train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
model,
#'naver-clova-ix/donut-base-finetuned-cord-v2',
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
...however, the last line is throwing the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-9d831be996e6> in <cell line: 4>()
2
3 max_length = 768
----> 4 train_dataset = DonutDataset(f'{projectdir}/input_doc_images',
5 model,
6 #'naver-clova-ix/donut-base-finetuned-cord-v2',
2 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
1616
AttributeError: 'VisionEncoderDecoderModel' object has no attribute 'json2token'
I'm a little confused because my model
object is a 'naver-clova-ix/donut-base-finetuned-cord-v2'
model, which according to this line from the model.py of the Donut github repo seems to suggest does in fact have a json2token
method???
What am I missing?
btw, you can view/copy my underlying data (images and json-lines metdata file) from my google drive 'donut' folder here: https://drive.google.com/drive/folders/1Gsr7d7Exvtx5PqjZQv2nXP9-pPDUEIOx?usp=sharing
答案1
得分: 1
To use the DonutDataset
correctly, you should use the model class from donut
instead of transformers
, and the json2token
function would work correctly e.g.
from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
import torch
pretrained_model = DonutModel.from_pretrained(
"naver-clova-ix/donut-base-finetuned-rvlcdip",
ignore_mismatched_sizes=True)
pretrained_model.encoder.to(torch.bfloat16)
train_dataset = DonutDataset('my_dataset/',
pretrained_model,
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
Note that the json2token
function is in the donut
repo here for the DonutModel
object.
And if we look at the transformers
, there's no json2token
here for the VisionEncoderDecoderModel
object.
To use the model from transformers
instead of donut
, you might need to read the data differently and not use donut.util.DonutDataset
to re-create the dataset into a Huggingface-friendly dataset, like this:
import PIL.Image
from datasets import Dataset
i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')
train_dataset = Dataset.from_dict({'images': [i1, i2]})
Then you have to do all the feature processing by yourself before feeding it to the model, see here.
英文:
To use the DonutDataset
correctly, you should use the model class from donut
instead of transformers
, and the json2token
function would work correctly e.g.
from donut.util import DonutDataset
from donut import DonutModel
from transformers import DonutProcessor, VisionEncoderDecoderModel, VisionEncoderDecoderConfig
import torch
pretrained_model = DonutModel.from_pretrained(
"naver-clova-ix/donut-base-finetuned-rvlcdip",
ignore_mismatched_sizes=True)
pretrained_model.encoder.to(torch.bfloat16)
train_dataset = DonutDataset(f'my_dataset/',
pretrained_model,
max_length=config.decoder.max_length,
split="train",
task_start_token="",
prompt_end_token="",
sort_json_key=True,
)
Note that the json2token
function is in the donut
repo https://github.com/clovaai/donut/blob/master/donut/model.py#L498 for the DonutModel
object.
And if we look at the transformers
, there's no json2token
https://github.com/huggingface/transformers/blob/main/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py#L151 for the VisionEncoderDecoderModel
object.
To use the model from transformers
instead of donut
, you might need to read the data differently and not use donut.util.DonutDataset
re-create the dataset into Huggingface friendly dataset, like this:
import PIL.Image
from datasets import Dataset
i1 = PIL.Image.open('my_dataset/alex_cannon_dep_first_page.png')
i2 = PIL.Image.open('my_dataset/mcentee_dep_first_page.png')
train_dataset = Dataset.from_dict({'images': [i1, i2]})
Then you have to do all the features processing by yourself before feeding it to the model, see https://huggingface.co/docs/transformers/model_doc/donut
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论