YoloV8 TFlite Python 预测和解释输出

huangapple go评论98阅读模式
英文:

YoloV8 TFlite Python Predictions And Interpreting output

问题

我是新手学习Python、Flutter和机器学习。我试图将Yolov8转换为TFLite模型,以便稍后构建一个Flutter应用程序。

我成功将Yolov8e转换为TFLite模型,使用Yolo导出命令。

在将模型移植到Flutter之前,我试图在Python中测试模型,以确保它按预期工作。我正在使用以下代码:

  1. import numpy as np
  2. import tensorflow as tf
  3. # Load the TFLite model
  4. model_path = "C:\\Users\\yolov8x_saved_model\\yolov8x_float32.tflite"
  5. interpreter = tf.lite.Interpreter(model_path=model_path)
  6. interpreter.allocate_tensors()
  7. # Get input and output details
  8. input_details = interpreter.get_input_details()
  9. output_details = interpreter.get_output_details()
  10. # Load and preprocess the image
  11. image_path = "C:\\Users\\Downloads\\2.jpeg"
  12. image = tf.keras.preprocessing.image.load_img(image_path, target_size=(640, 640))
  13. image_array = tf.keras.preprocessing.image.img_to_array(image)
  14. preprocessed_image = np.expand_dims(image_array, axis=0)
  15. # Set the input tensor to the preprocessed image
  16. interpreter.set_tensor(input_details[0]['index'], preprocessed_image)
  17. # Run the inference
  18. interpreter.invoke()
  19. # Get the output tensor and reshape it
  20. output_tensor = interpreter.get_tensor(output_details[0]['index'])
  21. output_shape = output_details[0]['shape']
  22. outputs = np.reshape(output_tensor, output_shape)
  23. print(outputs)

输出是:

  1. [[[6.20934343e+00 1.20168591e+01 1.99987564e+01 ... 5.18638123e+02
  2. 5.35865967e+02 5.85887085e+02]
  3. ...
  4. [1.57089694e-03 6.52399845e-04 1.49149655e-05 ... 2.00569357e-05
  5. 1.41740784e-05 5.61324532e-06]]]

所以我尝试将其转换:

  1. from pathlib import Path
  2. import re
  3. import yaml
  4. import cv2
  5. # ...
  6. # 以下是检测和绘制边界框的代码
  7. # ...
  8. # 最后显示图像
  9. cv2.imshow('image', original_image)
  10. cv2.waitKey(0)
  11. cv2.destroyAllWindows()

我遇到的问题是模型几乎将所有内容都预测为人。图像中有2个人,但我得到了100多个人的预测,精度约为70%到100%。

任何帮助都将不胜感激。

英文:

I am new to python, flutter and ML. I am trying to convert yolov8 to be a tflite model to later build a flutter application.

I managed to convert yolov8e to a tflite model using the yolo export command.

Before i move that model into flutter i am trying to test the model in python to make sure it functions as expected. The code i am using is below.

  1. import numpy as np
  2. import tensorflow as tf
  3. # Load the TFLite model
  4. model_path = "C:\\Users\\yolov8x_saved_model\\yolov8x_float32.tflite"
  5. interpreter = tf.lite.Interpreter(model_path=model_path)
  6. interpreter.allocate_tensors()
  7. # Get input and output details
  8. input_details = interpreter.get_input_details()
  9. output_details = interpreter.get_output_details()
  10. # Load and preprocess the image
  11. image_path = "C:\\Users\\Downloads\.jpeg"
  12. image = tf.keras.preprocessing.image.load_img(image_path, target_size=(640, 640))
  13. image_array = tf.keras.preprocessing.image.img_to_array(image)
  14. preprocessed_image = np.expand_dims(image_array, axis=0)
  15. # Set the input tensor to the preprocessed image
  16. interpreter.set_tensor(input_details[0]['index'], preprocessed_image)
  17. # Run the inference
  18. interpreter.invoke()
  19. # Get the output tensor and reshape it
  20. output_tensor = interpreter.get_tensor(output_details[0]['index'])
  21. output_shape = output_details[0]['shape']
  22. outputs = np.reshape(output_tensor, output_shape)
  23. print(output)

The output is
[[[6.20934343e+00 1.20168591e+01 1.99987564e+01 ... 5.18638123e+02
5.35865967e+02 5.85887085e+02]
...
[1.57089694e-03 6.52399845e-04 1.49149655e-05 ... 2.00569357e-05
1.41740784e-05 5.61324532e-06]]]

So i try to convert it

  1. from pathlib import Path
  2. import re
  3. import yaml
  4. import cv2
  5. def yaml_load(file='data.yaml', append_filename=False):
  6. with open(file, errors='ignore', encoding='utf-8') as f:
  7. s = f.read() # string
  8. # Remove special characters
  9. if not s.isprintable():
  10. s = re.sub(r'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD\U00010000-\U0010ffff]+', '', s)
  11. # Add YAML filename to dict and return
  12. return {**yaml.safe_load(s), 'yaml_file': str(file)} if append_filename else yaml.safe_load(s)
  13. CLASSES = yaml_load("C:\\Users\\Downloads\\coco128.yml")['names']
  14. colors = np.random.uniform(0, 255, size=(len(CLASSES), 3))
  15. original_image: np.ndarray = cv2.imread("C:\\Users\\Downloads\.jpeg")
  16. [height, width, _] = original_image.shape
  17. length = max((height, width))
  18. image = np.zeros((length, length, 3), np.uint8)
  19. image[0:height, 0:width] = original_image
  20. scale = length / 640
  21. def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
  22. label = f'{CLASSES[class_id]} ({confidence:.2f})'
  23. color = colors[class_id]
  24. cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)
  25. cv2.putText(img, label, (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
  26. outputs = np.array([cv2.transpose(outputs[0])])
  27. rows = outputs.shape[1]
  28. boxes = []
  29. scores = []
  30. class_ids = []
  31. for i in range(rows):
  32. classes_scores = outputs[0][i][4:]
  33. (minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)
  34. if maxScore >= 0.60:
  35. box = [outputs[0][i][0] - (0.5 * outputs[0][i][2]), outputs[0][i][1] - (0.5 * outputs[0][i][3]), outputs[0][i][2], outputs[0][i][3]]
  36. boxes.append(box)
  37. scores.append(maxScore)
  38. class_ids.append(maxClassIndex)
  39. result_boxes = cv2.dnn.NMSBoxes(boxes, scores, 0.25, 0.45, 0.5)
  40. detections = []
  41. for i in range(len(result_boxes)):
  42. index = result_boxes[i]
  43. box = boxes[index]
  44. detection = {
  45. 'class_id': class_ids[index],
  46. 'class_name': CLASSES[class_ids[index]],
  47. 'confidence': scores[index],
  48. 'box': box,
  49. 'scale': scale}
  50. if(CLASSES[class_ids[index]]=='person'):
  51. detections.append(detection)
  52. draw_bounding_box(original_image, class_ids[index], scores[index], round(box[0] * scale), round(box[1] * scale),
  53. round((box[0] + box[2]) * scale), round((box[1] + box[3]) * scale))
  54. cv2.imshow('image', original_image)
  55. cv2.waitKey(0)
  56. cv2.destroyAllWindows()

The problem i am getting is the model predicts almost everything as a person. There are 2 people in the image but i get over 100 person predictions of +-70 to 100% acc.

Any help would be appreciated.

答案1

得分: 1

以下是翻译好的内容:

  1. 调整大小和填充(Letterboxing) - LetterBox类的代码可以在这里找到。
  1. image_path = "demo.jpg"
  2. imgsize = 512
  3. im = [LetterBox(imgsize, auto=False, stride=32)(image=cv2.imread(image_path))]
  4. im = np.stack(im)
  5. print(im.shape)
  6. im = im[..., ::-1].transpose((0, 1, 2, 3)) # BGR转RGB,BHWC转BCHW,(n,3,h,w)
  7. print(im.shape)
  8. im = np.ascontiguousarray(im) # 连续的
  9. im = im.astype(np.float32)
  10. im /= 255
  11. # 分配输入和输出张量
  12. input_details = interpreter.get_input_details()
  13. output_details = interpreter.get_output_details()
  14. # 准备输入张量
  15. input_data = im
  16. interpreter.set_tensor(input_details[0]['index'], input_data)
  17. # 运行推理
  18. interpreter.invoke()
  19. output_data = interpreter.get_tensor(output_details[0]['index'])

可以看到.transpose函数不会改变im的形状。我使用了onnx2tf工具从ONNX转换为TFLite,它产生了与模型中最初的形状不同的输出。如果对您不起作用,请使用原始的代码

  1. 非最大抑制(NMS) - 这用于处理重叠的边界框。它保留具有最高置信度分数的边界框,并抑制所有其他高重叠(IoU)的边界框。原始代码位于这里。以下是用于测试目的的简化版本:
  1. nc = 0
  2. conf_thres = 0.25
  3. bs = output_data.shape[0] # 批量大小
  4. nc = nc or (output_data.shape[1] - 4) # 类别数
  5. nm = output_data.shape[1] - nc - 4
  6. mi = 4 + nc # 掩码开始索引
  7. xc = np.amax(output_data[:, 4:mi], 1) > conf_thres # 候选框
  8. # ...(省略部分代码)
  9. output[xi] = x[i]
  1. 重新缩放边界框 - 这一步是必需的,因为输出的边界框坐标是相对于输入图像的大小的。要获得原始图像的坐标,您需要重新缩放边界框坐标。
  1. def clip_boxes(boxes, shape):
  2. # 对边界框进行裁剪,确保在形状内
  3. # ...
  4. def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):
  5. # 从一个图像的形状(img1_shape)重新缩放边界框(xyxy格式)到另一个图像的形状(img0_shape)
  6. # ...
  7. results = []
  8. img = cv2.imread(image_path)
  9. for i, pred in enumerate(output):
  10. pred[:, :4] = scale_boxes((512, 512), pred[:, :4], img.shape)
  11. results.append(pred)

然后,在图像上绘制边界框:

  1. for detection in results:
  2. # ...
  3. # 绘制边界框和标签
  4. # ...

原始代码可以在这里找到。

希望这有所帮助!

更新: 要在移动设备上应用NMS,您应考虑将其与ONNX模型(在转换为TFLite之前)连接起来,如这里所述,因为据我所知,没有支持NMS操作的已知库。

英文:

There are some preprocessing and postprocessing steps that are used by YOLOv8 CLI and thus should be implemented in your pipeline:

  1. Resizing and Padding (Letterboxing)
  2. Non-Maximum Suppression (NMS)
  3. Rescaling Bounding Boxes

Resizing and Padding (Letterboxing) - the code of the LetterBox class can be found here.

  1. image_path = "demo.jpg"
  2. imgsize = 512
  3. im = [LetterBox(imgsize, auto=False, stride=32)(image=cv2.imread(image_path))]
  4. im = np.stack(im)
  5. print(im.shape)
  6. im = im[..., ::-1].transpose((0, 1, 2, 3)) # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
  7. print(im.shape)
  8. im = np.ascontiguousarray(im) # contiguous
  9. im = im.astype(np.float32)
  10. im /= 255
  11. # Allocate input and output tensors
  12. input_details = interpreter.get_input_details()
  13. output_details = interpreter.get_output_details()
  14. # Prepare the input tensor
  15. input_data = im
  16. interpreter.set_tensor(input_details[0]['index'], input_data)
  17. # Run inference
  18. interpreter.invoke()
  19. output_data = interpreter.get_tensor(output_details[0]['index'])

You can see that the .transpose function does not change the shape of the im. I used onnx2tf tool for conversion from ONNX to TFLite and it has produced different output shape than it was originally in the model. If it does not work for you then use the original code.

Non-Maximum Suppression (NMS) -
This is used to handle overlapping bounding boxes. It keeps the bounding box with the highest confidence score and suppresses all the other bounding boxes with high overlap (IoU). The original code is located here. Here is my simplified version for testing purposes:

  1. nc = 0
  2. conf_thres = 0.25
  3. bs = output_data.shape[0] # batch size
  4. nc = nc or (output_data.shape[1] - 4) # number of classes
  5. nm = output_data.shape[1] - nc - 4
  6. mi = 4 + nc # mask start index
  7. xc = np.amax(output_data[:, 4:mi], 1) > conf_thres # candidates
  8. multi_label=False
  9. multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
  10. prediction = np.transpose(output_data, (0, -1, -2))
  11. def xywh2xyxy(x):
  12. """
  13. Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the
  14. top-left corner and (x2, y2) is the bottom-right corner.
  15. Args:
  16. x (np.ndarray | torch.Tensor): The input bounding box coordinates in (x, y, width, height) format.
  17. Returns:
  18. y (np.ndarray | torch.Tensor): The bounding box coordinates in (x1, y1, x2, y2) format.
  19. """
  20. y = np.copy(x)
  21. y[..., 0] = x[..., 0] - x[..., 2] / 2 # top left x
  22. y[..., 1] = x[..., 1] - x[..., 3] / 2 # top left y
  23. y[..., 2] = x[..., 0] + x[..., 2] / 2 # bottom right x
  24. y[..., 3] = x[..., 1] + x[..., 3] / 2 # bottom right y
  25. return y
  26. prediction[..., :4] = xywh2xyxy(prediction[..., :4]) # xywh to xyxy
  27. output = [np.zeros((0, 6 + nm))] * bs
  28. max_nms=30000
  29. agnostic=False
  30. max_wh=7680
  31. iou_thres = 0.45
  32. max_det = 300
  33. for xi, x in enumerate(prediction): # image index, image inference
  34. x = x[xc[xi]] # confidence
  35. if not x.shape[0]:
  36. continue
  37. # Detections matrix nx6 (xyxy, conf, cls)
  38. box = x[:, :4]
  39. cls = x[:, 4:4+nc]
  40. mask = x[:, 4+nc:4+nc+nm]
  41. conf = np.max(cls, axis=1, keepdims=True)
  42. j = np.argmax(cls, axis=1, keepdims=True)
  43. # Concatenate the arrays along axis 1
  44. x = np.concatenate((box, conf, j.astype(float), mask), axis=1)
  45. # Reshape conf to a 1-dimensional array
  46. conf_flat = conf.flatten()
  47. # Filter the resulting array based on the condition conf_flat > conf_thres
  48. filtered_x = x[conf_flat > conf_thres]
  49. n = filtered_x.shape[0] # number of boxes
  50. if not n: # no boxes
  51. continue
  52. if n > max_nms: # excess boxes
  53. # Sort x based on the 5th column in descending order
  54. sorted_indices = np.argsort(x[:, 4])[::-1]
  55. # Select the top max_nms rows based on the sorted indices
  56. x = x[sorted_indices[:max_nms]]
  57. c = x[:, 5:6] * (0 if agnostic else max_wh)
  58. boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
  59. # Apply NMS using cv2.dnn.NMSBoxes function
  60. i = cv2.dnn.NMSBoxes(boxes, scores, score_threshold=0.4, nms_threshold=iou_thres)
  61. i = i[:max_det] # limit detections
  62. output[xi] = x[i]

Rescaling Bounding Boxes - This step is necessary because the output bounding box coordinates are relative to the size of the input image. To get the coordinates for the original image, you'll need to rescale the bounding box coordinates.

  1. def clip_boxes(boxes, shape):
  2. """
  3. It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the
  4. shape
  5. Args:
  6. boxes (torch.Tensor): the bounding boxes to clip
  7. shape (tuple): the shape of the image
  8. """
  9. boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1]) # x1, x2
  10. boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0]) # y1, y2
  11. def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):
  12. """
  13. Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in
  14. (img1_shape) to the shape of a different image (img0_shape).
  15. Args:
  16. img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width).
  17. boxes (torch.Tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)
  18. img0_shape (tuple): the shape of the target image, in the format of (height, width).
  19. ratio_pad (tuple): a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be
  20. calculated based on the size difference between the two images.
  21. Returns:
  22. boxes (torch.Tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2)
  23. """
  24. if ratio_pad is None: # calculate from img0_shape
  25. gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new
  26. pad = round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1), round(
  27. (img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1) # wh padding
  28. else:
  29. gain = ratio_pad[0][0]
  30. pad = ratio_pad[1]
  31. boxes[..., [0, 2]] -= pad[0] # x padding
  32. boxes[..., [1, 3]] -= pad[1] # y padding
  33. boxes[..., :4] /= gain
  34. clip_boxes(boxes, img0_shape)
  35. return boxes
  36. results = []
  37. img = cv2.imread(image_path)
  38. for i, pred in enumerate(output):
  39. pred[:, :4] = scale_boxes((512, 512), pred[:, :4], img.shape)
  40. results.append(pred)

The original code can be found here.

And then draw the bounding boxes on the image:

  1. for detection in results:
  2. print(detection)
  3. xmin, ymin, width, height, conf, class_id = detection[0]
  4. # Convert float coordinates to integers
  5. xmin = int(xmin)
  6. ymin = int(ymin)
  7. width = int(width)
  8. height = int(height)
  9. # Draw the rectangle on the image
  10. cv2.rectangle(img, (xmin, ymin), (width, height), (0, 255, 0), 2)
  11. # Add text label
  12. label = f"Class {int(class_id)}: {conf:.2f}"
  13. cv2.putText(img, label, (xmin, ymin - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

Here is my Google Colab. Hope it helps!

Update: To apply NMS on mobile device you should consider stitching it to the ONNX model (before conversion to TFLite) like described here as there is no support for NMS operations using known libraries as far as I know.

huangapple
  • 本文由 发表于 2023年6月1日 19:17:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381317.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定