英文:
Openpose on low resolution images?
问题
我正在尝试在低分辨率图像上获取人体姿势信息。特别是,我尝试了michalfaber的Keras OpenPose实现,但模型在低分辨率图像上似乎表现不佳,而在高分辨率图像上表现相当不错。我还在GitHub仓库上发布了一个问题issue,但我想在这里也尝试一下,因为我并没有完全确定使用这种特定的人体姿势检测实现。
我的图像大约是50-100像素的宽度和高度。以下是一个示例图像。我想知道是否有人知道如何修改程序、网络,或者知道一个在这种低分辨率图像上表现良好的人体姿势网络。
英文:
I'm trying to get the human pose information on low-resolution images. Particularly I've tried Keras OpenPose implementation by michalfaber, but the model seems to not perform well on low-resolution images while performing pretty well on higher resolution. I posted a question as an issue on GitHub repo as well but I thought I'd try here as well as I'm not set on that particular implementation of human pose detection.
My images are about 50-100 pixels width and height wise.
This is an example of the image. I wonder if anyone knows a way to modify the program, network, or knows of a human pose network that performs well on such low-resolution images.
答案1
得分: 1
如果您正在寻找不同的人体姿势估计网络,我强烈推荐MxNet GluonCV框架(https://gluon-cv.mxnet.io/model_zoo/pose.html)。它非常简单易用,还包含许多不同的姿势估计网络,您可以尝试并比较准确性和速度之间的权衡。例如,要使用它,您可以执行以下操作(摘自教程页面):
from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from gluoncv.data.transforms.pose import detector_to_alpha_pose, heatmap_to_coord_alpha_pose
detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
pose_net = model_zoo.get_model('alpha_pose_resnet101_v1b_coco', pretrained=True)
# 请注意,我们可以重置检测器的类别,只包括"person",以便NMS过程更快。
detector.reset_class(["person"], reuse_weights=['person'])
im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
'gluoncv/pose/soccer.png?raw=true',
path='soccer.png')
x, img = data.transforms.presets.yolo.load_test(im_fname, short=512)
print('预处理图像的形状:', x.shape)
class_IDs, scores, bounding_boxs = detector(x)
pose_input, upscale_bbox = detector_to_alpha_pose(img, class_IDs, scores, bounding_boxs)
predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord_alpha_pose(predicted_heatmap, upscale_bbox)
例如,对于准确性比较,他们的AlphaPose与Resnet 101网络明显比OpenPose更准确(您可以从上面的链接中找到更多准确性基准)。然而,需要注意的是理解这些网络类型之间的差异,例如实施自下而上和自上而下的方法,因为它们可能会影响不同场景下的推断速度。
例如,自上而下方法的运行时间与检测到的人数成正比,如果图像中有一群人,可能会耗费时间。
英文:
If you are looking for a different human pose estimation network , I would highly recommend the MxNet GluonCV framework (https://gluon-cv.mxnet.io/model_zoo/pose.html). It is very simple to use and also contains many different pose estimation networks that you can try and compare tradeoff between accuracy and speed. For example, to use it you can do (Taken from the tutorial page):
from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from gluoncv.data.transforms.pose import detector_to_alpha_pose, heatmap_to_coord_alpha_pose
detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
pose_net = model_zoo.get_model('alpha_pose_resnet101_v1b_coco', pretrained=True)
# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.
detector.reset_class(["person"], reuse_weights=['person'])
im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
'gluoncv/pose/soccer.png?raw=true',
path='soccer.png')
x, img = data.transforms.presets.yolo.load_test(im_fname, short=512)
print('Shape of pre-processed image:', x.shape)
class_IDs, scores, bounding_boxs = detector(x)
pose_input, upscale_bbox = detector_to_alpha_pose(img, class_IDs, scores, bounding_boxs)
predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord_alpha_pose(predicted_heatmap, upscale_bbox)
For accuracy comparison for example, their AlphaPose with Resnet 101 network is significantly more accurate than OpenPose (You can find more accuracy benchmarks from the link above). A caveat, is however understanding the difference between types of these networks such as implementing Bottom-Up and Top-Down approach since it can affect the inference speed at different scenarios.
For example, the runtime of the top-down approaches is proportional to the number of detected people, it can be time-consuming if your image has a crowd of people.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论