2023年7月10日 22:37:36go评论86阅读模式

英文:

DataLoader error: RuntimeError: stack expects each tensor to be equal size, but got [1024] at entry 0 and [212] at entry 13

问题

我有一个数据集，其中有一个名为input_ids的列，我正在使用DataLoader加载它：

train_batch_size = 2
eval_dataloader = DataLoader(val_dataset, batch_size=train_batch_size)

eval_dataloader的长度是：

print(len(eval_dataloader))
>>> 1623

当我运行以下代码时出现错误：

for step, batch in enumerate(eval_dataloader):
    print(step)

每个批次的长度是1024。如果我将train_batch_size更改为1，错误就会消失。

我尝试使用以下代码删除最后一个批次：

eval_dataloader = DataLoader(val_dataset, batch_size=train_batch_size, drop_last=True)

但是，仍然会出现批次大小大于1的错误。

完整的堆栈跟踪信息如下：

RuntimeError: stack expects each tensor to be equal size, but got [212] at entry 0 and [1024] at entry 1

在train_dataloader中也存在类似的问题：

RuntimeError: stack expects each tensor to be equal size, but got [930] at entry 0 and [1024] at entry 1

更新
通过@chro和这篇Reddit帖子解决了这个问题：“为了分离问题，使用批次大小为1，不进行洗牌，遍历数据加载器中的项目，并打印您获得的数组形状。然后调查具有不同大小的那些项目。”

似乎有一个序列的长度不是1024，但是如果批次大小不为1，就无法看到这个问题。不太确定如何拥有具有不同长度的张量的张量，但无论如何。为了解决问题，我首先对数据集进行了筛选，并删除了长度不是1024的一个序列。然后在其上调用了DataLoader。

英文:

I have a dataset composed of a column name input_ids that I'm loading with a DataLoader:

train_batch_size = 2
eval_dataloader = DataLoader(val_dataset, batch_size=train_batch_size)

The length of eval_dataloader is

print(len(eval_dataloader))
&gt;&gt;&gt; 1623

I'm getting the error when I run:

for step, batch in enumerate(eval_dataloader):
    print(step)
&gt;&gt;&gt; 1,2... ,1621

Each batch length is 1024. If I change train_batch_size to 1 the error disappears.

I tried removing the last batch with

eval_dataloader = DataLoader(val_dataset, batch_size=train_batch_size, drop_last=True)

But the error still pops up with batch of size greater than 1.

The complete stack:

RuntimeError                              Traceback (most recent call last)
Cell In[34], line 2
      1 eval_dataloader = DataLoader(val_dataset,shuffle=True,batch_size=2,drop_last=True) 
----&gt; 2 for step, batch in enumerate(eval_dataloader):
      3     print(step, batch[&#39;input_ids&#39;].shape)
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/dataloader.py:628, in _BaseDataLoaderIter.__next__(self)
    625 if self._sampler_iter is None:
    626     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    627     self._reset()  # type: ignore[call-arg]
--&gt; 628 data = self._next_data()
    629 self._num_yielded += 1
    630 if self._dataset_kind == _DatasetKind.Iterable and \
    631         self._IterableDataset_len_called is not None and \
    632         self._num_yielded &gt; self._IterableDataset_len_called:
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/dataloader.py:671, in _SingleProcessDataLoaderIter._next_data(self)
    669 def _next_data(self):
    670     index = self._next_index()  # may raise StopIteration
--&gt; 671     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    672     if self._pin_memory:
    673         data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:61, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
     59 else:
     60     data = self.dataset[possibly_batched_index]
---&gt; 61 return self.collate_fn(data)
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:265, in default_collate(batch)
    204 def default_collate(batch):
    205     r&quot;&quot;&quot;
    206         Function that takes in a batch of data and puts the elements within the batch
    207         into a tensor with an additional outer dimension - batch size. The exact output type can be
   (...)
    263             &gt;&gt;&gt; default_collate(batch)  # Handle `CustomType` automatically
    264     &quot;&quot;&quot;
--&gt; 265     return collate(batch, collate_fn_map=default_collate_fn_map)
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:128, in collate(batch, collate_fn_map)
    126 if isinstance(elem, collections.abc.Mapping):
    127     try:
--&gt; 128         return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
    129     except TypeError:
    130         # The mapping type may not support `__init__(iterable)`.
    131         return {key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:128, in &lt;dictcomp&gt;(.0)
    126 if isinstance(elem, collections.abc.Mapping):
    127     try:
--&gt; 128         return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
    129     except TypeError:
    130         # The mapping type may not support `__init__(iterable)`.
    131         return {key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:120, in collate(batch, collate_fn_map)
    118 if collate_fn_map is not None:
    119     if elem_type in collate_fn_map:
--&gt; 120         return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
    122     for collate_type in collate_fn_map:
    123         if isinstance(elem, collate_type):
File ~/anaconda3/envs/cilm/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:163, in collate_tensor_fn(batch, collate_fn_map)
    161     storage = elem.storage()._new_shared(numel, device=elem.device)
    162     out = elem.new(storage).resize_(len(batch), *list(elem.size()))
--&gt; 163 return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [212] at entry 0 and [1024] at entry 1

I found other somewhat similar SO questions / regular questions, but they seem to be related to the stack function in other settings (link, link, link, link)

Similar issue exist in the train_dataloader:
RuntimeError: stack expects each tensor to be equal size, but got [930] at entry 0 and [1024] at entry 1

Update
Solved it thanks to @chro and this reddit post: "To isolate the problem loop over the items in the dataloader with batch size 1 without shuffle and print the shape of the array you got. Then investigate the ones with different sizes".

Seems like there was a sequence that wasn't of length 1024, but this cannot be seen for some reason if the batch is not of size 1. Not entirely sure how you can have a tensor of tensors with varying lengths, but alas. To resolve the issue I filtered my dataset first and removed the 1 sequence that was not 1024. Then called the DataLoader on it.

答案1

得分: 1

以下是您要翻译的代码部分：

eval_dataloader = DataLoader(val_dataset,
                             batch_size=1) 
for step, batch in enumerate(eval_dataloader):
    if batch.shape[1]!=1024:
        print(step, batch.shape)

请注意，我将代码部分提取出来进行翻译，不包括其他内容。

英文:

Could you debug it with (replace batch.shape with relevant code to your data)

eval_dataloader = DataLoader(val_dataset,
                             batch_size=1) 
for step, batch in enumerate(eval_dataloader):
    if batch.shape[1]!=1024:
        print(step, batch.shape)

My idea is to check the following:

Does it fails on the same item in dataset?
What is the shape of item it fails?

Usually I see this error when it stacks several elements in DataLoader, but some of the elements are in different size.

Please, also write a complete stack trace related to problem.

Update:
To resolve the issue filter dataset first and removed the 1 sequence that was not same with others. Then called the DataLoader on it

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

DataLoader error: RuntimeError: stack expects each tensor to be equal size, but got [1024] at entry 0 and [212] at entry 13

问题

答案1

RuntimeError: 在PyTorch代码中，预期标量类型为Double，但找到了Float。

Sklearn随机森林：确定模型拟合和预测所确定的特征名称。

AttributeError: module 'os' has no attribute 'add_dll_directory'

super() 为何在同一类中使用

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。