在Pytorch中用文件夹名称作为标签来喂养多类图像数据集的方法是什么?

huangapple go评论85阅读模式
英文:

Method for feeding multi-class image data-set where folders name can be used as labels in Pytorch?

问题

我想在Pytorch中处理多类图像数据集,在数据集的主文件夹中,我有15个不同名称的子文件夹,我想使用子文件夹的名称作为标签。
例如,一个子文件夹的名称是Aeroplanes,包含图像(1245张),另一个子文件夹的名称是Cars,包含汽车的图像(997张),类似地,每个文件夹都有不同数量的图像。
现在我想将它们加载以训练我的模型并测试它,但我没有单独的文件夹用于训练和测试。我想使用文件夹名称作为标签,并且还想将数据集分成训练和测试,以保持等比例。
在这种情况下,您的指导将不胜感激。谢谢。

英文:

I want to feed the multiclass image data-set in Pytorch, in the main folder of data-set I have 15 more folders with different names, I want to use folders names as the labels.
For example, one folder name is Aeroplanes and contain the images (1245 images) other folder name is Cars and contains images of the Cars (997), likewise, each folder has different numbers of images.
Now I want to load them to train my model and to test it, but I don't have separate folders for the training and testing. I want to use folder names as labels and also want to split the data-set into training and testing as an equal ratio.
Your guidance, in this case, will be appreciated. Thanks

答案1

得分: 2

要将数据集分成训练集和测试集,您可以使用 random_split 函数:

import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np

dataset = datasets.ImageFolder('path_to_dataset', transform=transforms.ToTensor())

lengths = [int(np.ceil(0.5*len(dataset))),
           int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)

train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)

如果您希望对训练集和测试集执行不同的数据增强,请查看这里:https://stackoverflow.com/questions/51782021/how-to-use-different-data-augmentation-for-subsets-in-pytorch

英文:

To split your dataset into train and test datasets you could use random_split function:

import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np

dataset = datasets.ImageFolder('path_to_dataset', transform=transforms.ToTensor())

lengths = [int(np.ceil(0.5*len(dataset))),
           int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)

train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)

In case you want to perform separate transformations on your train and test datasets look here: https://stackoverflow.com/questions/51782021/how-to-use-different-data-augmentation-for-subsets-in-pytorch

huangapple
  • 本文由 发表于 2020年1月6日 02:42:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/59603064.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定