英文:
Running a pre trained ONNX model - image recognition
问题
抱歉,我不能提供代码翻译的服务。
英文:
I am trying to run a pre-trained ONNX model (trained on a third-party labeling tool) for image recognition. The model is trained via some pre-defined labels in the tool. The next aim now is to be able to run this model outside the tool. For the same, I am taking a sample image and trying to run the same via model to get the identified labels as output. While doing so I hit an impediment regarding how to adjust the inputs. The model needs inputs as follows:
How can I adjust my inputs in the following code?
import cv2
import numpy as np
import onnxruntime
import pytesseract
import PyPDF2
# Load the image
image = cv2.imread("example.jpg")
# Check if the image has been loaded successfully
if image is None:
raise ValueError("Failed to load the image")
# Get the shape of the image
height, width = image.shape[:2]
# Make sure the height and width are positive
if height <= 0 or width <= 0:
raise ValueError("Invalid image size")
# Set the desired size of the resized image
dsize = (640, 640)
# Resize the image using cv2.resize
resized_image = cv2.resize(image, dsize)
# Display the resized image
cv2.imshow("Resized Image", resized_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Load the ONNX model
session = onnxruntime.InferenceSession("ic/model.onnx")
# Check if the model has been loaded successfully
if session is None:
raise ValueError("Failed to load the model")
# Get the input names and shapes of the model
inputs = session.get_inputs()
for i, input_info in enumerate(inputs):
print(f"Input {i}: name = {input_info.name}, shape = {input_info.shape}")
# Run the ONNX model
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
prediction = session.run([output_name], {input_name: image})[0]
# Postprocess the prediction to obtain the labels
labels = postprocess(prediction)
# Use PyTesseract to extract the text from the image
text = pytesseract.image_to_string(image)
# Print the labels and the text
print("Labels:", labels)
print("Text:", text)
Because the code throws the following error:
ValueError: Model requires 4 inputs. Input Feed contains 1
答案1
得分: 1
对于你的情况,你需要将批量(batch)添加到输入中。根据你的报告,你只有图像的形状是('sequence', 640, 640),但你训练的模型输入是('batch', 'sequence', 224, 224)。
要解决这个问题,你应该添加批量维度(batch dimension)并转置张量,如下所示:
img_batch = np.expand_dims(img_normalized, axis=0)
img_transposed = np.transpose(img_batch, (0, 3, 1, 2))
其中:
np.expand_dims
:用于为你的输入图像添加'batch'np.transpose
:用于在正确的位置更改位置,我的意思是,可能在添加'batch'后,图像的形状可能为(640, 1, 3, 640),然后你需要将其更改为与训练输入模型相同的形状(1, 3, 640, 640)。类似这样。
记得将你的图像调整大小为(224, 224),而不是(640, 640)。
让我们再试一次,希望对你有帮助。
英文:
For your case, you need to append batch into input. As your report, you only have the shape of image is ('sequence', 640, 640), but your trained model input is ('batch', 'sequence', 224, 224).
To fix this problem, you should add batch dimension and transpose the tensor as example:
img_batch = np.expand_dims(img_normalized, axis=0)
img_transposed = np.transpose(img_batch, (0, 3, 1, 2))
Where:
np.expand_dims
: to add 'batch' for your input imagenp.transpose
: to change the position in right place, I mean it maybe have a shape of image after adding 'batch' as (640, 1, 3, 640), then you need to change as the same as trained input model is (1, 3, 640, 640). Something like this.
Remember to resize your image in a shape of (224, 224) instead of (640, 640).
Let try again, I hope it's helpful for you.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论