在COCO物体关键点相似性方程中,S代表什么?

huangapple go评论79阅读模式
英文:

What is S in COCO Object Keypoint Similarity equation?

问题

我正在尝试理解关键点检测算法中的对象关键点相似性(OKS)。然而,根据定义,我不能完全理解方程中的“S”是什么意思。
这是方程(https://cocodataset.org/#keypoints-eval):

OKS = Σi[exp(-di2/2s2κi2)δ(vi>0)] / Σi[δ(vi>0)]

其中“s”被提及为对象尺度。我不知道它真正意味着什么。有人可以给更多解释吗?
谢谢

英文:

I am trying to understand Object Keypoint Similarity (OKS) in keypoint detection algorithm. However, based on the definition I cannot fully understand what is "S" in the equation means.
Here is the equation (https://cocodataset.org/#keypoints-eval):

OKS = Σi[exp(-di2/2s2κi2)δ(vi>0)] / Σi[δ(vi>0)]

where "s" is mentioned as object scale. I dont know what does it really means. Can someone give more explanation?
Thank you

答案1

得分: 0

OKS方程中的s代表的是对象尺度,它是关键点所属对象的大小度量。这意味着在关键点检测任务中,比如人体姿势估计(关键点检测的最常见应用之一),"对象"指的是图像中感兴趣的实体,这里是人。

"对象"的"尺度"通常指的是图像上的大小。通常表示为包含对象的边界框的面积。对于人体姿势估计任务,如果围绕人物的边界框很大,意味着人(或对象)在图像中显得较大,因此具有较大的尺度。相反,如果边界框较小,人在图像中显得较小,尺度较小。

在考虑关键点之间的距离(例如人体上的肘部、膝盖、眼睛等)时,尺度非常重要。对于较大的尺度(大边界框),关键点通常较远,因为人在图像中占据更多空间。对于较小的尺度(小边界框),关键点较接近,因为人在图像中显得较小。

在OKS方程中,'s'用于归一化预测关键点与实际关键点之间的距离。

通过除以尺度,我们考虑了一个错误,例如,五个像素的误差,如果人在图像中很小(因此关键点较接近),则更为显著,而如果人在图像中很大(关键点较远),则不太显著。

英文:

What does the s rapresents

The s in the OKS equation represents the object scale, which is a measure of the size of the object that the keypoints belong to. Meaning that in keypoint detection tasks such as human pose estimation (one of the most common applications of keypoint detection), an "object" refers to the entity of interest in the image, in this case, a human being.

The "scale" of an object typically refers to its size in the context of the image. It's often represented as the area of the bounding box containing the object. For a human pose estimation task, if the bounding box around the person is large, it means that the person (or the object) appears larger in the image, thus having a larger scale. Conversely, if the bounding box is small, the person appears smaller in the image, and the scale is smaller.

The scale is critical when considering the distance between keypoints (like elbows, knees, eyes etc. on a human body). For a larger scale (big bounding box), the keypoints are generally farther apart because the person takes up more space in the image. For a smaller scale (small bounding box), the keypoints are closer together because the person appears smaller.

In the OKS equation, 's' is used to normalize the distances between the predicted and actual keypoints.

Why diving by it

By dividing by the scale, we account for the fact that an error of, say, five pixels is a lot more significant if the person is small in the image (and thus the keypoints are close together), than if the person is large in the image (and the keypoints are far apart).

huangapple
  • 本文由 发表于 2023年7月23日 16:55:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747386.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定