2023年6月5日 04:16:30go评论95阅读模式

英文:

Pytorch: how will next() behave for a list of DataLoaders of different length

问题

我的数据有几个条件，分别是 A、B、C。我想要执行以下操作。

为每个条件抽取一个样本
从完整数据集中随机抽取一个样本
一些训练操作

因此，在一个批次中，我希望有以下内容：

[condition_A, condition_B, condition_C, random_sample]

我已经创建了一个字典，格式如下：

loader_dict = {
cond_A : DataLoader(...Subset Magic...), 
cond_B : DataLoader(...Subset Magic...),
cond_C : DataLoader(...Subset Magic...)
}

train_loader = DataLoader(...full dataset...)

现在，在每个 epoch 中，我想要：

从这 4 个加载器中获取一个批次
对它们进行一些神秘操作

目前，我在第一步上有点困惑。

我目前的方法是：

# 获取一个列表，格式为 [loader_A, loader_B, loader_C]
train_loaders = list(zip(*loader_dict.values()))

for batch_idx, batch in enumerate(tqdm(train_loader)):
    condit_sample = [next(loader) for loader in train_loaders]
    
    # 对 torch.cat([batch, condit_sample]) 进行一些操作

现在我不确定 - next() 调用是否会始终只选择条件加载器的第一个批次（不期望的结果），还是它会实际遍历条件的样本？

此外，我的数据类似于 50% 条件 A，35% 条件 B，15% 条件 C。

因此，我想知道，我的代码是否会遍历完全数据集的所有 100 个批次，并且重复条件 A 两次，条件 B 几乎 3 次，条件 C 6 次？还是代码只会遍历条件 C 的所有样本然后停止？

目前，多次遍历条件样本将足够满足需求。

对于以后的用途，我想考虑以下内容：

每个 epoch 只选择一个真正随机的样本（每个 epoch 中都不同）
遍历所有条件加载器的样本
一旦最小的条件样本“遍历完”，就终止该 epoch。

英文:

My data has several conditions A, B, C. I would like to do the following.

Draw a sample for each condition
Draw a random sample from the full data set
Some training magic

Thus, I would have in one batch something like

[condition_A, condition_B, condition_C, random_sample]

I have created a dictionary of the form

loader_dict = {
cond_A : DataLoader(...Subset Magic...), 
cond_B : DataLoader(...Subset Magic...),
cond_C : DataLoader(...Subset Magic...)
}

train_loader = DataLoader(...full dataset...)

Now during each epoch I would like to

Get a batch from each of the 4 loaders
Process them in some net shenanigans

Currently, I am a bit stuck on the 1st point.

My approach so far is

# get a list of form [loader_A, loader_B, loader_C]
train_loaders = list(zip(*loader_dict.values()))

for batch_idx, batch in enumerate(tqdm(train_loader)):
    condit_sample = [next(loader) for loader in train_loaders]
    
    # do something with torch.cat([batch, condit_sample])

Now I am not sure - will the next() call actually always just pick the first batch of the conditions loaders (not desired) or will it actually iterate through the samples of the conditions?

Also, my data has something like 50% condition_A, 35% condition_B, 15% condition_C

Thus, I wonder, whether my code would run e.g. through all 100 batches of the full dataset and repeat condition_A twice, condition_B nearly 3 times and condition_C 6 times? Or will the code just run through all samples of condition C and break down?

Currently, the multiple cycling through the conditional samples would suffice.

For later purposes, I would like to consider the following:

just pick a really random sample (in each epoch something different) from the full dataset
cycle through all the conditional loader samples
terminate the epoch whenever the smallest condition sample is "cycled through"

答案1

得分: 0

我自己进行了实验。它将表现得像 itertools.cycle()。

英文:

Made the experiment myself. It will behave as itertools.cycle()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PyTorch: next() 在不同长度的 DataLoader 列表上会有怎样的行为？

问题

答案1

在Flask网站中使用SQLAlchemy操作数据库时遇到的问题。

使用已缓存的属性在一个命名元组上。

如何找到`np.diagonal()`中的最大/最小偏移？

Python中的sum()函数

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论