英文:
Attribute error while integrating Google TTS with YOLOv8
问题
以下是您要翻译的内容:
"My project aims to detect object labels and coordinates and then convert them into a string which is converted into voice using gTTS but I keep getting an attribute error in the prediction labels. I am new to this framework, any help will be appreciated."
"Code:"
import cv2
from gtts import gTTS
import os
from ultralytics import YOLO
def convert_labels_to_text(labels):
text = ", ".join(labels)
return text
class YOLOWithLabels(YOLO):
def __call__(self, frame):
results = super().__call__(frame)
labels = results.pred[0].get_field("labels").tolist()
annotated_frame = results.render()
return annotated_frame, labels
cap = cv2.VideoCapture(0)
model = YOLOWithLabels('yolov8n.pt')
while cap.isOpened():
success, frame = cap.read()
if success:
annotated_frame, labels = model(frame)
message = convert_labels_to_text(labels)
tts_engine = gTTS(text=message) # Initialize gTTS with the message
tts_engine.save("output.mp3")
os.system("output.mp3")
cv2.putText(annotated_frame, message, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow("YOLOv8 Inference", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
else:
break
cap.release()
cv2.destroyAllWindows()
"Error"
File "C:\Users\alien\Desktop\YOLOv8 project files\gtts service\testservice.py", line 13, in __call__
labels = results.pred[0].get_field("labels").tolist()
^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'pred'
print(results)
~~~
orig_shape: (480, 640)
path: 'image0.jpg'
probs: None
save_dir: None
speed: {'preprocess': 3.1604766845703125, 'inference': 307.905912399292, 'postprocess': 2.8924942016601562}]
0: 480x640 1 person, 272.4ms
Speed: 3.0ms preprocess, 272.4ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
[ultralytics.yolo.engine.results.Results object with attributes:
boxes: ultralytics.yolo.engine.results.Boxes object
keypoints: None
keys: ['boxes']
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush.'}
orig_img: array([[[168, 167, 166],
[165, 165, 165],
[165, 166, 167],
...,
[183, 186, 178],
[183, 186, 178],
[184, 187, 179]],
[[168, 167, 165],
[166, 165, 165],
[166, 167, 166],
...,
[184, 187, 179],
[183, 186, 178],
[184, 187, 179]],
[[168, 167, 164],
[167, 167, 164],
[167, 167, 165],
...,
[184, 187, 178],
[184, 187, 179],
[183, 186, 178]],
...,
[[196, 192, 185],
[196, 192, 185],
[196, 192, 185],
...,
[ 25, 29, 38],
[ 22, 25, 35],
[ 20, 24, 34]],
[[199, 195, 187],
[197, 193, 186],
[197, 193, 186],
...,
[ 23, 26, 35],
[ 22, 25, 35],
[ 22, 25, 35]],
[[199, 195, 187],
[199, 195, 187],
[199, 195, 187],
...,
[ 20, 24, 33],
[ 19
<details>
<summary>英文:</summary>
My project aims to detect object labels and coordinates and then convert them into a string which is converted into voice using gTTS but I keep getting an attribute error in the prediction labels. I am new to this framework, any help will be appreciated.
Code:
import cv2
from gtts import gTTS
import os
from ultralytics import YOLO
def convert_labels_to_text(labels):
text = ", ".join(labels)
return text
class YOLOWithLabels(YOLO):
def __call__(self, frame):
results = super().__call__(frame)
labels = results.pred[0].get_field("labels").tolist()
annotated_frame = results.render()
return annotated_frame, labels
cap = cv2.VideoCapture(0)
model = YOLOWithLabels('yolov8n.pt')
while cap.isOpened():
success, frame = cap.read()
if success:
annotated_frame, labels = model(frame)
message = convert_labels_to_text(labels)
tts_engine = gTTS(text=message) # Initialize gTTS with the message
tts_engine.save("output.mp3")
os.system("output.mp3")
cv2.putText(annotated_frame, message, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow("YOLOv8 Inference", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
else:
break
cap.release()
cv2.destroyAllWindows()
Error
File "C:\Users\alien\Desktop\YOLOv8 project files\gtts service\testservice.py", line 13, in __call__
labels = results.pred[0].get_field("labels").tolist()
^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'pred'
print(results)
~~~
orig_shape: (480, 640)
path: 'image0.jpg'
probs: None
save_dir: None
speed: {'preprocess': 3.1604766845703125, 'inference': 307.905912399292, 'postprocess': 2.8924942016601562}]
0: 480x640 1 person, 272.4ms
Speed: 3.0ms preprocess, 272.4ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
[ultralytics.yolo.engine.results.Results object with attributes:
boxes: ultralytics.yolo.engine.results.Boxes object
keypoints: None
keys: ['boxes']
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
orig_img: array([[[168, 167, 166],
[165, 165, 165],
[165, 166, 167],
...,
[183, 186, 178],
[183, 186, 178],
[184, 187, 179]],
[[168, 167, 165],
[166, 165, 165],
[166, 167, 166],
...,
[184, 187, 179],
[183, 186, 178],
[184, 187, 179]],
[[168, 167, 164],
[167, 167, 164],
[167, 167, 165],
...,
[184, 187, 178],
[184, 187, 179],
[183, 186, 178]],
...,
[[196, 192, 185],
[196, 192, 185],
[196, 192, 185],
...,
[ 25, 29, 38],
[ 22, 25, 35],
[ 20, 24, 34]],
[[199, 195, 187],
[197, 193, 186],
[197, 193, 186],
...,
[ 23, 26, 35],
[ 22, 25, 35],
[ 22, 25, 35]],
[[199, 195, 187],
[199, 195, 187],
[199, 195, 187],
...,
[ 20, 24, 33],
[ 19, 23, 33],
[ 19, 23, 33]]], dtype=uint8)
~~~
</details>
# 答案1
**得分**: 0
愿上帝怜悯那位编写Ultralytics文档的人...以下是如何仅打印标签的方法:
```python
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model('http://images.cocodataset.org/val2017/000000397133.jpg')
print(model.names)
for result in results:
boxes = result.boxes.cpu().numpy()
for box in boxes:
print(model.names[box.cls[0]])
model.names
包含所有可预测的类别。每个 box
都有一个 cls
(缩写为 class
)属性,它是一个整数值列表。您可以在 model.names
字典中查找该类别。
附注:每个 box
都有一个 整数 列表,表示模型可以为一个边界框返回多个类别。此示例仅获取 box.cls
列表中的第一个类别。
英文:
May God have mercy on whoever wrote Ultralytics's docs... Here's how you can print only the labels:
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model('http://images.cocodataset.org/val2017/000000397133.jpg')
print(model.names)
for result in results:
boxes = result.boxes.cpu().numpy()
for box in boxes:
print(model.names[box.cls[0]])
model.names
contains all the classes that can be predicted. Each box
has a cls
(class
for short) attribute which is a list of int
values. You can search for that class in the model.names
dictionary.
PS: Each box
has a list of ints
which indicates that the model can return multiple classes for one bounding box. This example only takes first of the classes in the box.cls
list.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论