如果在 YOLO 中选择的网格框太小,会更难检测到物体吗?

huangapple go评论47阅读模式
英文:

If we choose the grid box in YOLO too small , will it be worse to detect a object?

问题

In the pic, I split the pic into a 3 by 3 grid, I knew it will be better to choose a 19 by 19. But will it be worse to detect an object if I choose a too-dense grid, like 3000 by 3000? (Don't concern about the running time, just the accuracy of detection)

我把图片分成了3x3的网格,我知道选择一个19x19的网格会更好。但是如果我选择一个太密集的网格,比如3000x3000,会不会更差于检测对象?(不要担心运行时间,只关心检测的准确度)

I think it will because the conv net will be spatial correlative, which means it doesn't know what happens even in the adjacent box, so it will be hard to train to get the feature of an object if too much concreting on the small detail in 3000 by 3000 boxes.

我认为会更差,因为卷积神经网络将具有空间相关性,这意味着它甚至不知道相邻区域发生了什么,因此如果过于专注于3000x3000的小细节,将很难训练获取对象的特征。

Am I right? If I make a mistake, please help me correct it, thanks!!!

我说得对吗?如果我犯了错误,请帮助我纠正,谢谢!

英文:

如果在 YOLO 中选择的网格框太小,会更难检测到物体吗?

In the pic, I split the pic into a 3 by 3 grid, I knew it will be better to choose a 19 by 19.
But will it be worse to detect an object if I choose a too-dense grid, like 3000 by 3000? (Don't concern about the running time, just the accuracy of detection)

I think it will because the conv net will be spatial correlative, which means it doesn't know what happens even in the adjacent box, so it will be hard to train to get the feature of an object if too much concreting on the small detail in 3000 by 3000 boxes.

Am I right? If I make a mistake, please help me correct it, thanks!!!

答案1

得分: 0

将图像分割为3000x3000的小块,用于对象检测或其他任务,可能会对准确性产生负面影响,因为在这些小块的边缘可能会丢失信息或特征,这将导致无法捕获图像中对象的细节,从而降低准确性。以yoloV3为例,它在三个层面进行预测:13 x 13层负责检测大对象,而52 x 52层检测较小的对象,26 x 26层检测中等大小的对象。有关更多详细信息,请查看此博文:https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b。问题是为什么需要将其分割成3000 x 3000?

英文:

Splitting an image to 3000*3000 patches for objects detection or any other tasks can affect the accuracy negatively as the information or features may be lost at the edges of the patches which will lead to the failure of capturing the details of the objects in the image, resulting in a loss of accuracy. Take for instance yoloV3 it makes predictions at three layers: the 13 x 13 layer is responsible for detecting large objects, whereas the 52 x 52 layer detects the smaller objects, with the 26 x 26 layer detecting medium objects. For more details check out this blog post: https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b. The question is why the need to split it to 3000 by 3000?

huangapple
  • 本文由 发表于 2023年4月4日 09:08:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924774.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定