PyTorch: next() 在不同长度的 DataLoader 列表上会有怎样的行为?

huangapple go评论95阅读模式
英文:

Pytorch: how will next() behave for a list of DataLoaders of different length

问题

我的数据有几个条件,分别是 A、B、C。我想要执行以下操作。

  • 为每个条件抽取一个样本
  • 从完整数据集中随机抽取一个样本
  • 一些训练操作

因此,在一个批次中,我希望有以下内容:

[condition_A, condition_B, condition_C, random_sample]

我已经创建了一个字典,格式如下:

loader_dict = {
cond_A : DataLoader(...Subset Magic...), 
cond_B : DataLoader(...Subset Magic...),
cond_C : DataLoader(...Subset Magic...)
}

train_loader = DataLoader(...full dataset...)

现在,在每个 epoch 中,我想要:

  1. 从这 4 个加载器中获取一个批次
  2. 对它们进行一些神秘操作

目前,我在第一步上有点困惑。

我目前的方法是:

# 获取一个列表,格式为 [loader_A, loader_B, loader_C]
train_loaders = list(zip(*loader_dict.values()))

for batch_idx, batch in enumerate(tqdm(train_loader)):
    condit_sample = [next(loader) for loader in train_loaders]
    
    # 对 torch.cat([batch, condit_sample]) 进行一些操作

现在我不确定 - next() 调用是否会始终只选择条件加载器的第一个批次(不期望的结果),还是它会实际遍历条件的样本?

此外,我的数据类似于 50% 条件 A,35% 条件 B,15% 条件 C

因此,我想知道,我的代码是否会遍历完全数据集的所有 100 个批次,并且重复条件 A 两次,条件 B 几乎 3 次,条件 C 6 次?还是代码只会遍历条件 C 的所有样本然后停止?

目前,多次遍历条件样本将足够满足需求。

对于以后的用途,我想考虑以下内容:

  • 每个 epoch 只选择一个真正随机的样本(每个 epoch 中都不同)
  • 遍历所有条件加载器的样本
  • 一旦最小的条件样本“遍历完”,就终止该 epoch。
英文:

My data has several conditions A, B, C. I would like to do the following.

  • Draw a sample for each condition
  • Draw a random sample from the full data set
  • Some training magic

Thus, I would have in one batch something like

[condition_A, condition_B, condition_C, random_sample]

I have created a dictionary of the form

loader_dict = {
cond_A : DataLoader(...Subset Magic...), 
cond_B : DataLoader(...Subset Magic...),
cond_C : DataLoader(...Subset Magic...)
}

train_loader = DataLoader(...full dataset...)

Now during each epoch I would like to

  1. Get a batch from each of the 4 loaders
  2. Process them in some net shenanigans

Currently, I am a bit stuck on the 1st point.

My approach so far is

# get a list of form [loader_A, loader_B, loader_C]
train_loaders = list(zip(*loader_dict.values()))

for batch_idx, batch in enumerate(tqdm(train_loader)):
    condit_sample = [next(loader) for loader in train_loaders]
    
    # do something with torch.cat([batch, condit_sample])

Now I am not sure - will the next() call actually always just pick the first batch of the conditions loaders (not desired) or will it actually iterate through the samples of the conditions?

Also, my data has something like 50% condition_A, 35% condition_B, 15% condition_C

Thus, I wonder, whether my code would run e.g. through all 100 batches of the full dataset and repeat condition_A twice, condition_B nearly 3 times and condition_C 6 times? Or will the code just run through all samples of condition C and break down?

Currently, the multiple cycling through the conditional samples would suffice.

For later purposes, I would like to consider the following:

  • just pick a really random sample (in each epoch something different) from the full dataset
  • cycle through all the conditional loader samples
  • terminate the epoch whenever the smallest condition sample is "cycled through"

答案1

得分: 0

我自己进行了实验。它将表现得像 itertools.cycle()

英文:

Made the experiment myself. It will behave as itertools.cycle()

huangapple
  • 本文由 发表于 2023年6月5日 04:16:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76402242.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定