2020年1月6日 02:42:28go评论116阅读模式

英文:

Method for feeding multi-class image data-set where folders name can be used as labels in Pytorch?

问题

我想在Pytorch中处理多类图像数据集，在数据集的主文件夹中，我有15个不同名称的子文件夹，我想使用子文件夹的名称作为标签。
例如，一个子文件夹的名称是Aeroplanes，包含图像（1245张），另一个子文件夹的名称是Cars，包含汽车的图像（997张），类似地，每个文件夹都有不同数量的图像。
现在我想将它们加载以训练我的模型并测试它，但我没有单独的文件夹用于训练和测试。我想使用文件夹名称作为标签，并且还想将数据集分成训练和测试，以保持等比例。
在这种情况下，您的指导将不胜感激。谢谢。

英文:

I want to feed the multiclass image data-set in Pytorch, in the main folder of data-set I have 15 more folders with different names, I want to use folders names as the labels.
For example, one folder name is Aeroplanes and contain the images (1245 images) other folder name is Cars and contains images of the Cars (997), likewise, each folder has different numbers of images.
Now I want to load them to train my model and to test it, but I don't have separate folders for the training and testing. I want to use folder names as labels and also want to split the data-set into training and testing as an equal ratio.
Your guidance, in this case, will be appreciated. Thanks

答案1

得分: 2

要将数据集分成训练集和测试集，您可以使用 random_split 函数：

import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np
dataset = datasets.ImageFolder('path_to_dataset', transform=transforms.ToTensor())
lengths = [int(np.ceil(0.5*len(dataset))),
           int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)
train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)

如果您希望对训练集和测试集执行不同的数据增强，请查看这里：https://stackoverflow.com/questions/51782021/how-to-use-different-data-augmentation-for-subsets-in-pytorch

英文:

To split your dataset into train and test datasets you could use random_split function:

import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np
dataset = datasets.ImageFolder(&#39;path_to_dataset&#39;, transform=transforms.ToTensor())
lengths = [int(np.ceil(0.5*len(dataset))),
           int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)
train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)

In case you want to perform separate transformations on your train and test datasets look here: https://stackoverflow.com/questions/51782021/how-to-use-different-data-augmentation-for-subsets-in-pytorch

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pytorch中用文件夹名称作为标签来喂养多类图像数据集的方法是什么？

问题

答案1

为什么我的Python总是在使用(f”函数时显示”语法错误”？

Pandas DataFrame aggregation with a condition：Pandas数据帧按条件聚合

添加来自for循环的结果

在Django中未通过外键关系获取特定对象。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。