如何衡量图像分类模型的稳健性?

huangapple go评论89阅读模式
英文:

How to measure image classification model robustness?

问题

图像分类模型,如在iNaturalist或iWildcam上训练的动物分类数据,有时会与背景产生虚假关联。如何衡量仅由这些虚假关联引起的模型性能限制,而不是其他合理的(非虚假)原因(即两种动物确实非常相似)?

英文:

Image Classification models trained on animal classification data like iNaturalist or iWildcam sometimes developed spurious correlations with the background. How to measure model performance limitations caused only by such spurious correlations as opposed to other plausible (non-spurious) reasons (i.e 2 animals do look a lot like each other) ?!

答案1

得分: 0

Google [1],[4] 定义了“内分布鲁棒性”为模型在相同数据的保留测试集上的性能。而“外分布鲁棒性”(这是问题的重点)是模型在不同数据集上对相同对象的分类性能。Google 用来展示他们的新一流模型“ViT-Plex”的基准数据集包括:CIFAR10Vs100 [2],CIFAR100Vs10,ImageNet 对 Places3 和 RETINA。此外,在 PapersWihCode 上,还有多个用于OOD检测的其他基准数据集 [3]。

[1] https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html

[2] https://paperswithcode.com/sota/out-of-distribution-detection-on-cifar-100-vs

[3] https://paperswithcode.com/task/out-of-distribution-detection

[4] https://arxiv.org/pdf/2207.07411.pdf

英文:

Google [1],[4] defines In-Distribution Robustness as a model's performance on the same data hold-out test set. While Out-of-Distribution robustness (which is the focus of the question) is the model's performance on classifying the same object but on a different dataset. Benchmark datasets Google used to demo their new state-of-the-art model "ViT-Plex" were: CIFAR10Vs100 [2], CIFAR100Vs10, ImageNet Vs.Places3 and RETINA. Also in PapersWihCode, there are multiple other benchmark datasets for OOD Detection [3].

[1] https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html

[2] https://paperswithcode.com/sota/out-of-distribution-detection-on-cifar-100-vs

[3] https://paperswithcode.com/task/out-of-distribution-detection

[4]https://arxiv.org/pdf/2207.07411.pdf

huangapple
  • 本文由 发表于 2023年2月10日 09:59:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406257.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定