问题

如何从现有的包含两个四维数组的tf.data数据集创建一个tf.data数据集？我有一个包含图像和相应分割掩模的数据集。所以我从图像和掩模路径创建了一个tf.data数据集，并对数据集应用了一些预处理函数。在这一步之后，图像和掩模的形状变为[x,h,w,c]和[x,h,w,c]。因此，当使用dataset.as_numpy_iterator()时，我获得了两个具有这些形状的数组。现在，我想创建一个数据集，其中的元素将是形状为[h,w,c]和[h,w,c]的两个数组，其中第一个维度中的每个切片现在变成数据集的一个单独元素。因此，如果最初我的数据集有10个元素，现在应该有10 * x个元素。但我无法从现有的数据集中切片出元素。以下是我尝试过的方法：

dataset = tf.data.Dataset.from_tensor_slices((imagepath, maskpath))
dataset = dataset.map(lambda imagepath, maskpath: tf.py_function(preprocessData, 
                                                inp=[imagepath, maskpath], 
                                                Tout=[tf.float64]*2))
datasetnew = tf.data.Dataset.from_tensor_slices(dataset)

你遇到的错误是：

ValueError: Unbatching a dataset is only supported for rank >= 1

在这里，"rank" 的含义是数据集的维度。数据集必须至少具有一维才能执行“unbatching”操作。为了实现你的目标，你可以考虑将数据集拆分为两个独立的数据集，然后使用tf.data.Dataset.zip将它们组合在一起，以获得期望的结果。这里是一种可能的方法：

image_dataset = tf.data.Dataset.from_tensor_slices(imagepath)
mask_dataset = tf.data.Dataset.from_tensor_slices(maskpath)

# Apply the same preprocessing to both datasets
image_dataset = image_dataset.map(lambda imagepath: tf.py_function(preprocessData, inp=[imagepath], Tout=tf.float64))
mask_dataset = mask_dataset.map(lambda maskpath: tf.py_function(preprocessData, inp=[maskpath], Tout=tf.float64))

# Combine the datasets
combined_dataset = tf.data.Dataset.zip((image_dataset, mask_dataset))

这将创建一个包含两个形状为[h,w,c]的数组的数据集，其中每个切片在第一个维度上都变成了数据集的一个独立元素。如果最初的数据集有10个元素，现在combined_dataset将有10 * x个元素。

英文:

How to create a tf.data dataset out of an existing tf.data dataset whose elements consist of 2 four dimensional arrays? I have a dataset of images and corresponding segmentation masks. So I create a tf.dataset from the image and mask paths and apply some functions to the dataset for preprocessing. After this step the img and masks have shapes as [x,h,w,c] and [x,h,w,c]. So when using dataset.as_numpy_iterator() I get two arrays of these shapes. Now, I want to create a dataset whose element will be 2 arrays of shape [h,w,c] and [h,w,c] where each of the slice in the first dimension now becomes a separate element of the dataset. So if initially my dataset had 10 elements, it should now have 10 * x elements. But I am not able to slice out the elements from the existing dataset. This is what I have tried:

dataset = tf.data.Dataset.from_tensor_slices((imagepath, maskpath))
dataset = dataset.map(lambda imagepath, maskpath: tf.py_function(preprocessData, 
                                                inp=[imagepath, maskpath], 
                                                Tout=[tf.float64]*2))
datasetnew = tf.data.Dataset.from_tensor_slices(dataset)

Where the error I get is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_34/343231121.py in &lt;module&gt;
      3                                                 inp=[flairimg_val, msk_val],
      4                                                 Tout=[tf.float64]*2))
----&gt; 5 datasetnew = tf.data.Dataset.from_tensor_slices(datasetval)
      6 # datasetval = datasetval.map(lambda flairimg_val, msk_val, path: get_2p5D_repre(flairimg_val, msk_val, path))
      7 # datasetval = datasetval.map(lambda flairimg_val, msk_val, path: try_return(flairimg_val, msk_val, path))

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/dataset_ops.py in from_tensor_slices(tensors)
    758       Dataset: A `Dataset`.
    759     &quot;&quot;&quot;
--&gt; 760     return TensorSliceDataset(tensors)
    761 
    762   class _GeneratorState(object):

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, element)
   3320     element = structure.normalize_element(element)
   3321     batched_spec = structure.type_spec_from_value(element)
-&gt; 3322     self._tensors = structure.to_batched_tensor_list(batched_spec, element)
   3323     self._structure = nest.map_structure(
   3324         lambda component_spec: component_spec._unbatch(), batched_spec)  # pylint: disable=protected-access

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in to_batched_tensor_list(element_spec, element)
    362   # pylint: disable=protected-access
    363   # pylint: disable=g-long-lambda
--&gt; 364   return _to_tensor_list_helper(
    365       lambda state, spec, component: state + spec._to_batched_tensor_list(
    366           component), element_spec, element)

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in _to_tensor_list_helper(encode_fn, element_spec, element)
    337     return encode_fn(state, spec, component)
    338 
--&gt; 339   return functools.reduce(
    340       reduce_fn, zip(nest.flatten(element_spec), nest.flatten(element)), [])
    341 

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in reduce_fn(state, value)
    335   def reduce_fn(state, value):
    336     spec, component = value
--&gt; 337     return encode_fn(state, spec, component)
    338 
    339   return functools.reduce(

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in &lt;lambda&gt;(state, spec, component)
    363   # pylint: disable=g-long-lambda
    364   return _to_tensor_list_helper(
--&gt; 365       lambda state, spec, component: state + spec._to_batched_tensor_list(
    366           component), element_spec, element)
    367 

/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/dataset_ops.py in _to_batched_tensor_list(self, value)
   3492   def _to_batched_tensor_list(self, value):
   3493     if self._dataset_shape.ndims == 0:
-&gt; 3494       raise ValueError(&quot;Unbatching a dataset is only supported for rank &gt;= 1&quot;)
   3495     return self._to_tensor_list(value)
   3496 

ValueError: Unbatching a dataset is only supported for rank &gt;= 1

Not sure what the rank part means here for a dataset? How to achieve this?

答案1

得分: 1

你正在寻找 unbatch 方法

> 将数据集的元素拆分为多个元素。
>
> 例如，如果数据集的元素具有形状 [B, a0, a1, ...]，
> 其中 B 可能因每个输入元素而异，则对于数据集中的每个元素，
> 未批处理的数据集将包含形状为 [a0, a1, ...] 的连续 B 个元素。

英文:

You're looking for the unbatch method

> Splits elements of a dataset into multiple elements.
>
> For example, if elements of the dataset are shaped [B, a0, a1, ...],
> where B may vary for each input element, then for each element in the
> dataset, the unbatched dataset will contain B consecutive elements of
> shape [a0, a1, ...].

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将现有的tf.data数据集切片为新数据集的元素

问题

答案1

训练 VGG16 从头开始在 Keras 中不会提高准确性。

选择具有精确数值的2D张量的索引。

我在配置Keras/TensorFlow模型时遇到了困难。

无法安装tensorflow 2.0

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论