2023年4月13日 18:39:11go评论104阅读模式

英文:

Data augmentation in Pytorch for CNN

问题

我想对我的图像集进行数据增强，以便在PyTorch中训练卷积神经网络时有更多的数据。

变换示例：

train_transforms = Compose([LoadImage(image_only=True), EnsureChannelFirst(), ScaleIntensity(), RandRotate(range_x=np.pi / 12, prob=0.5, keep_size=True), RandFlip(spatial_axis=0, prob=0.5)])

在我理解的PyTorch中，变换会对图像进行处理，但处理后的图像是唯一使用的，而不使用原始图像。我希望对数据进行转换，然后同时使用原始数据和转换后的数据，因为我的目标是增加数据。但是，我们如何通过应用这些变换来实际增加输入数据的数量呢？如果我想要使用翻转进行数据增强（例如），我希望同时使用原始数据和转换后的数据（以便用更多的数据来训练模型）。

我尝试将变换添加到我的数据中，但似乎只使用了转换后的数据，数据发生了变化，但并没有增加。

英文:

I want to do data augmentation to my set of images in order to have more data to train a convolutional neural network in Pytorch.

Example of transnformations:

 train_transforms = Compose([LoadImage(image_only=True),EnsureChannelFirst(),ScaleIntensity(),RandRotate(range_x=np.pi / 12, prob=0.5, keep_size=True),RandFlip(spatial_axis=0, prob=0.5)]

The transforms in Pytorch, as I understand, make a transformation of the image but then the transformed image is the only one used, and no the original one. I want to do transformations to my data and then use the original one and the transformed one, as my objective is to augment the data...But then, how can we actually increment the number of input data by applying these transformations? If I want to do data augmentation with flip (for example), I want to use my original data and the transformed one (in order to train the model with more data).

I tried to add transformations to my data but it seems like the transformed data is the only one used, obtaining changes on the data but not an increase of it.

答案1

得分: 0

如果您希望同时使用原始数据和增强数据，您可以将它们连接起来，然后创建一个数据加载器来使用它们。所以步骤如下：

创建一个带有数据增强的数据集。
创建一个不带数据增强的数据集。
通过连接两者创建一个数据集。
使用连接后的数据集创建一个数据加载器。

我猜您已经知道如何创建带有数据增强的数据集。要连接多个数据集，您可以使用：

from torch.utils.data import ConcatDataset
concat_dataset = ConcatDataset([dataset1, dataset2])

这里有更多信息：
链接

英文:

If you want your original data and augmented data at same time, you can just concatenate them and then create a dataloader to use them. So the steps are these:

Create a dataset with data augmentations.
Create a dataset without data augmentations.
Create a dataset by concatenating both.
Create a dataloader with the concatenated dataset.

I guess you already know how to create datasets with data augmentation. To concatenate several datasets you can use:

from torch.utils.data import ConcatDataset
concat_dataset = ConcatDataset([dataset1, dataset2])

Here you have more information:
https://discuss.pytorch.org/t/how-does-concatdataset-work/60083/2

答案2

得分: 0

在你的torch数据集类中，你可以检查索引是否大于数据集的长度，然后返回一个增强过的图像。

class ExampleDataset(Dataset):
    def __init__(self):
        self.data = ...
        self.real_length = len(self.data)
        self.length = self.real_length * 2
    
    def __len__(self):
        return self.length
    
    def __getitem__(self, idx):
        if idx < self.real_length:
            return self.data[idx]
        else:
            return augment(self.data[idx - self.real_length])

你可以根据你想要进行的增强操作，多次扩展你的数据集（3次、4次）。

英文:

In your torch dataset class you can check if the index is bigger than the length of your dataset, then you can return an augmented image.

class ExampleDataset(Dataset):
    def __init__(self):
        self.data = ...
        self.real_length = len(self.data)
        self.length = self.real_length * 2
    
    def __len__(self):
        return self.length
    
    def __getitem__(self, idx):
        if idx &lt; self.real_length:
            return self.data[idx]
        else:
            return augment(self.data[idx - self.real_length])

You can extend your data more times (3, 4) depending on the augmentations you want to do.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PyTorch中的数据增强用于CNN。

问题

答案1

答案2

将数据框中的行分组为一个区间的条件。

I am not able to use the imports on torchrl or its modules. I am installing it from the official github repo https://github.com/pytorch/rl

使用JAX在大型二维数组上查找最大的n个值

验证CSV文件，检查枚举列是否包含任何无效的编码值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。