如何将一个NumPy的ndarray转换成PyTorch的数据集?

huangapple go评论76阅读模式
英文:

How do I turn a numpy ndarray into a PyTorch dataset?

问题

I have a numpy ndarray with a shape (16699, 128, 128), where each element is an image of 128 by 128 pixels, each image normalized to a range of 0 to 1.
现在,要将图像放入神经网络模型中,我必须取数组的每个元素,将其转换为张量,并使用.unsqueeze(0)添加一个额外维度,将其格式变为(C, W, H)。
因此,我想使用PyTorch提供的dataloader和dataset方法来简化所有这些,以便使用批处理等。我该如何做?

This is the method I have now:
这是我目前的方法:

epochs = 3

for epoch in range(epochs):
    for i in range(X):
        y = torch.from_numpy(y[i])
        x = torch.from_numpy(X[i]).unsqueeze(0)
        ...
英文:

I have a numpy ndarray with a shape (16699, 128, 128), where each element is an image of 128 by 128 pixels, each image normalized to a range of 0 to 1.
Now, to put the image into a neural network model, I have to take each element of the array, convert it to a tensor, and add one extra-dimension with .unsqueeze(0) to it to bring it to the format (C, W, H).
So I'd like to simplify all this with the dataloader and dataset methods that PyTorch has to use batches and etc. How I can do it?

This is the method I have now:

epochs = 3

for epoch in range(epochs):
    for i in range(X):
        y = torch.from_numpy(y[i])
        x = torch.from_numpy(X[i]).unsqueeze(0)
        ...

答案1

得分: 1

One way is to convert X and y to two tensors (both with the same length), then wrap them in a torch.utils.data.TensorDataset.

from torch.utils.data import TensorDataset, DataLoader

batch_size = 128
dataset = TensorDataset(torch.from_numpy(X).unsqueeze(1), torch.from_numpy(y))
loader = DataLoader(dataset, shuffle=True, batch_size=batch_size)

...

# training loop
for epoch in range(epochs):
    for x, y in loader:
        # x is a tensor batch of images with shape (batch_size, 1, H, W)
        # y is a tensor with the corresponding labels
        ...
英文:

One way is to convert X and y to two tensors (both with the same length), then wrap them in a torch.utils.data.TensorDataset.

from torch.utils.data import TensorDataset, DataLoader

batch_size = 128
dataset = TensorDataset(torch.from_numpy(X).unsqueeze(1), torch.from_numpy(y))
loader = DataLoader(dataset, shuffle=True, batch_size=batch_size)

...

# training loop
for epoch in range(epochs):
    for x, y in loader:
        # x is a tensor batch of images with shape (batch_size, 1, H, W)
        # y is a tensor with the corresponding labels
        ...


</details>



huangapple
  • 本文由 发表于 2023年3月7日 23:46:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75664176.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定