英文:
How to measure image classification model robustness?
问题
图像分类模型,如在iNaturalist或iWildcam上训练的动物分类数据,有时会与背景产生虚假关联。如何衡量仅由这些虚假关联引起的模型性能限制,而不是其他合理的(非虚假)原因(即两种动物确实非常相似)?
英文:
Image Classification models trained on animal classification data like iNaturalist or iWildcam sometimes developed spurious correlations with the background. How to measure model performance limitations caused only by such spurious correlations as opposed to other plausible (non-spurious) reasons (i.e 2 animals do look a lot like each other) ?!
答案1
得分: 0
Google [1],[4] 定义了“内分布鲁棒性”为模型在相同数据的保留测试集上的性能。而“外分布鲁棒性”(这是问题的重点)是模型在不同数据集上对相同对象的分类性能。Google 用来展示他们的新一流模型“ViT-Plex”的基准数据集包括:CIFAR10Vs100 [2],CIFAR100Vs10,ImageNet 对 Places3 和 RETINA。此外,在 PapersWihCode 上,还有多个用于OOD检测的其他基准数据集 [3]。
[1] https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html
[2] https://paperswithcode.com/sota/out-of-distribution-detection-on-cifar-100-vs
[3] https://paperswithcode.com/task/out-of-distribution-detection
[4] https://arxiv.org/pdf/2207.07411.pdf
英文:
Google [1],[4] defines In-Distribution Robustness as a model's performance on the same data hold-out test set. While Out-of-Distribution robustness (which is the focus of the question) is the model's performance on classifying the same object but on a different dataset. Benchmark datasets Google used to demo their new state-of-the-art model "ViT-Plex" were: CIFAR10Vs100 [2], CIFAR100Vs10, ImageNet Vs.Places3 and RETINA. Also in PapersWihCode, there are multiple other benchmark datasets for OOD Detection [3].
[1] https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html
[2] https://paperswithcode.com/sota/out-of-distribution-detection-on-cifar-100-vs
[3] https://paperswithcode.com/task/out-of-distribution-detection
[4]https://arxiv.org/pdf/2207.07411.pdf
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论