2023年2月13日 23:26:37go评论133阅读模式

英文:

I would like to finetune the blip model on ROCO data set for image captioning of chest x-rays

问题

以下是代码的翻译部分：

我想在 ROCO 数据库上微调 BLIP 模型，用于图像字幕化胸部 X 光图像。但是我遇到了一个关于整数索引的错误。

有谁可以帮助我理解错误的原因以及如何纠正它。

这是代码：

def read_data(filepath, csv_path, n_samples):
    df = pd.read_csv(csv_path)
    images = []
    capts = []
    for idx in range(len(df)):
        if 'hest x-ray' in df['caption'][idx] or 'hest X-ray' in df['caption'][idx]:
            if len(images) > n_samples:
                break            
            else:
                images.append(Image.open(os.path.join(filepath, df['name'][idx]).convert('L'))
                capts.append(df['caption'][idx])
    return images, capts

def get_data():
    imgtrpath = 'all_data/train/radiology/images'
    trcsvpath = 'all_data/train/radiology/traindata.csv'
    imgtspath = 'all_data/test/radiology/images'
    tscsvpath = 'all_data/test/radiology/testdata.csv'
    imgvalpath = 'all_data/validation/radiology/images'
    valcsvpath = 'all_data/validation/radiology/valdata.csv'

    print('提取训练数据')
    trainimgs, traincapts = read_data(imgtrpath, trcsvpath, 1800)
    
    print('提取测试数据')
    testimgs, testcapts = read_data(imgtrpath, trcsvpath, 100)
    
    print('提取验证数据')
    valimgs, valcapts = read_data(imgtrpath, trcsvpath, 100)

    return trainimgs, traincapts, testimgs, testcapts, valimgs, valcapts

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainimgs, traincapts, testimgs, testcapts, valimgs, valcapts = get_data() 
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")

metric = evaluate.load("accuracy")
traindata = processor(text=traincapts, images=trainimgs, return_tensors="pt", padding=True, truncation=True)
evaldata =  processor(text=testcapts, images=testimgs, return_tensors="pt", padding=True, truncation=True)
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=traindata,
    eval_dataset=evaldata,
    compute_metrics=compute_metrics
)
trainer.train()

这段代码的目的是对 ROCO 数据集的胸部 X 光图像进行图像字幕化的 BLIP 模型微调。但是当我运行它时，出现了以下错误：

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\transformers\feature_extraction_utils.py", line 86, in __getitem__
raise KeyError("Indexing with integers is not available when using Python based feature extractors")
KeyError: '使用基于Python的特征提取器时，不支持使用整数进行索引'

如果需要进一步的帮助，可以提出具体问题。

英文:

I want to fine tune the blip model on ROCO database for image captioning chest x-ray images. But I am getting an error regarding integer indexing.

Can anyone please help me understand the cause of the error and how to rectify it.

This is the code:

def read_data(filepath,csv_path,n_samples):
df = pd.read_csv(csv_path)
images = []
capts = []
for idx in range(len(df)):
if &#39;hest x-ray&#39; in df[&#39;caption&#39;][idx] or &#39;hest X-ray&#39; in df[&#39;caption&#39;][idx]:
if len(images)&gt;n_samples:
break            
else:
images.append(Image.open(os.path.join(filepath,df[&#39;name&#39;][idx])).convert(&#39;L&#39;))
capts.append(df[&#39;caption&#39;][idx])
return images, capts
def get_data():
imgtrpath = &#39;all_data/train/radiology/images&#39;
trcsvpath = &#39;all_data/train/radiology/traindata.csv&#39;
imgtspath = &#39;all_data/test/radiology/images&#39;
tscsvpath = &#39;all_data/test/radiology/testdata.csv&#39;
imgvalpath = &#39;all_data/validation/radiology/images&#39;
valcsvpath = &#39;all_data/validation/radiology/valdata.csv&#39;
print(&#39;Extracting Training Data&#39;)
trainimgs, traincapts = read_data(imgtrpath, trcsvpath, 1800)
print(&#39;Extracting Testing Data&#39;)
testimgs, testcapts = read_data(imgtrpath, trcsvpath, 100)
print(&#39;Extracting Validation Data&#39;)
valimgs, valcapts = read_data(imgtrpath, trcsvpath, 100)
return trainimgs, traincapts, testimgs, testcapts, valimgs, valcapts
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
trainimgs, traincapts, testimgs, testcapts, valimgs, valcapts = get_data() 
model = BlipForConditionalGeneration.from_pretrained(&quot;Salesforce/blip-image-captioning-large&quot;)
processor = BlipProcessor.from_pretrained(&quot;Salesforce/blip-image-captioning-large&quot;)
metric = evaluate.load(&quot;accuracy&quot;)
traindata = processor(text=traincapts, images=trainimgs, return_tensors=&quot;pt&quot;, padding=True, truncation=True)
evaldata =  processor(text=testcapts, images=testimgs, return_tensors=&quot;pt&quot;, padding=True, truncation=True)
training_args = TrainingArguments(output_dir=&quot;test_trainer&quot;, evaluation_strategy=&quot;epoch&quot;)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=traindata,
eval_dataset=evaldata,
compute_metrics=compute_metrics
)
trainer.train()

The code is meant to fine-tune the BLIP model on the ROCO dataset chest x-ray images for the purpose of image captioning.
But when I run it, I am getting this error:

  File &quot;C:\Users\omair\anaconda3\envs\torch\lib\site-packages\transformers\feature_extraction_utils.py&quot;, line 86, in __getitem__
raise KeyError(&quot;Indexing with integers is not available when using Python based feature extractors&quot;)
KeyError: &#39;Indexing with integers is not available when using Python based feature extractors&#39;

答案1

得分: 0

有两个问题：

在训练过程中，您没有提供标签，您的...capts被传递为模型的“Question”。在下面的链接中有一个如何做的示例。
目前不支持微调HF的BlipForConditionalGeneration，请参见https://discuss.huggingface.co/t/finetune-blip-on-customer-dataset-20893/28446，他们刚刚修复了BlipForQuestionAnswering。如果您基于此链接创建数据集，您还将遇到错误ValueError: Expected input batch_size (0) to match target batch_size (511)，如果您努力复制对BlipForQuestionAnswering所做更改，则可以解决此问题。

英文:

There are two issues here:

You're not providing the labels during training, your ...capts are passed as the model's "Question". There is an example on how to do that in the link below.
Finetuning HF's BlipForConditionalGeneration is not supported at the moment, see https://discuss.huggingface.co/t/finetune-blip-on-customer-dataset-20893/28446 where they just fixed BlipForQuestionAnswering. If you create a dataset based on this link, you will also get the error ValueError: Expected input batch_size (0) to match target batch_size (511). which can be solved if you put the effort to reproduce the changes made on BlipForQuestionAnswering to BlipForConditionalGeneration.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

I would like to finetune the blip model on ROCO data set for image captioning of chest x-rays

问题

答案1

自动修复 ‘invalid-envvar-default’ (W1508) pylint 问题

为什么我的for循环与if语句一起不起作用？

如何通过另一个 np.ndarray 的元素来过滤一个 np.ndarray 的值。

如何使用滑块自动更新与数据框变化的图表。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论