英文:
Method for feeding multi-class image data-set where folders name can be used as labels in Pytorch?
问题
我想在Pytorch中处理多类图像数据集,在数据集的主文件夹中,我有15个不同名称的子文件夹,我想使用子文件夹的名称作为标签。
例如,一个子文件夹的名称是Aeroplanes,包含图像(1245张),另一个子文件夹的名称是Cars,包含汽车的图像(997张),类似地,每个文件夹都有不同数量的图像。
现在我想将它们加载以训练我的模型并测试它,但我没有单独的文件夹用于训练和测试。我想使用文件夹名称作为标签,并且还想将数据集分成训练和测试,以保持等比例。
在这种情况下,您的指导将不胜感激。谢谢。
英文:
I want to feed the multiclass image data-set in Pytorch, in the main folder of data-set I have 15 more folders with different names, I want to use folders names as the labels.
For example, one folder name is Aeroplanes and contain the images (1245 images) other folder name is Cars and contains images of the Cars (997), likewise, each folder has different numbers of images.
Now I want to load them to train my model and to test it, but I don't have separate folders for the training and testing. I want to use folder names as labels and also want to split the data-set into training and testing as an equal ratio.
Your guidance, in this case, will be appreciated. Thanks
答案1
得分: 2
要将数据集分成训练集和测试集,您可以使用 random_split
函数:
import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np
dataset = datasets.ImageFolder('path_to_dataset', transform=transforms.ToTensor())
lengths = [int(np.ceil(0.5*len(dataset))),
int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)
train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)
如果您希望对训练集和测试集执行不同的数据增强,请查看这里:https://stackoverflow.com/questions/51782021/how-to-use-different-data-augmentation-for-subsets-in-pytorch
英文:
To split your dataset into train and test datasets you could use random_split
function:
import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np
dataset = datasets.ImageFolder('path_to_dataset', transform=transforms.ToTensor())
lengths = [int(np.ceil(0.5*len(dataset))),
int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)
train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)
In case you want to perform separate transformations on your train and test datasets look here: https://stackoverflow.com/questions/51782021/how-to-use-different-data-augmentation-for-subsets-in-pytorch
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论