英文:
How to slice an existing tf.data dataset into elements of a new dataset
问题
如何从现有的包含两个四维数组的tf.data
数据集创建一个tf.data
数据集?我有一个包含图像和相应分割掩模的数据集。所以我从图像和掩模路径创建了一个tf.data
数据集,并对数据集应用了一些预处理函数。在这一步之后,图像和掩模的形状变为[x,h,w,c]和[x,h,w,c]。因此,当使用dataset.as_numpy_iterator()
时,我获得了两个具有这些形状的数组。现在,我想创建一个数据集,其中的元素将是形状为[h,w,c]和[h,w,c]的两个数组,其中第一个维度中的每个切片现在变成数据集的一个单独元素。因此,如果最初我的数据集有10个元素,现在应该有10 * x
个元素。但我无法从现有的数据集中切片出元素。以下是我尝试过的方法:
dataset = tf.data.Dataset.from_tensor_slices((imagepath, maskpath))
dataset = dataset.map(lambda imagepath, maskpath: tf.py_function(preprocessData,
inp=[imagepath, maskpath],
Tout=[tf.float64]*2))
datasetnew = tf.data.Dataset.from_tensor_slices(dataset)
你遇到的错误是:
ValueError: Unbatching a dataset is only supported for rank >= 1
在这里,"rank" 的含义是数据集的维度。数据集必须至少具有一维才能执行“unbatching”操作。为了实现你的目标,你可以考虑将数据集拆分为两个独立的数据集,然后使用tf.data.Dataset.zip
将它们组合在一起,以获得期望的结果。这里是一种可能的方法:
image_dataset = tf.data.Dataset.from_tensor_slices(imagepath)
mask_dataset = tf.data.Dataset.from_tensor_slices(maskpath)
# Apply the same preprocessing to both datasets
image_dataset = image_dataset.map(lambda imagepath: tf.py_function(preprocessData, inp=[imagepath], Tout=tf.float64))
mask_dataset = mask_dataset.map(lambda maskpath: tf.py_function(preprocessData, inp=[maskpath], Tout=tf.float64))
# Combine the datasets
combined_dataset = tf.data.Dataset.zip((image_dataset, mask_dataset))
这将创建一个包含两个形状为[h,w,c]的数组的数据集,其中每个切片在第一个维度上都变成了数据集的一个独立元素。如果最初的数据集有10个元素,现在combined_dataset
将有10 * x
个元素。
英文:
How to create a tf.data
dataset out of an existing tf.data
dataset whose elements consist of 2 four dimensional arrays? I have a dataset of images and corresponding segmentation masks. So I create a tf.dataset
from the image and mask paths and apply some functions to the dataset for preprocessing. After this step the img and masks have shapes as [x,h,w,c] and [x,h,w,c]. So when using dataset.as_numpy_iterator()
I get two arrays of these shapes. Now, I want to create a dataset whose element will be 2 arrays of shape [h,w,c] and [h,w,c] where each of the slice in the first dimension now becomes a separate element of the dataset. So if initially my dataset had 10 elements, it should now have 10 * x
elements. But I am not able to slice out the elements from the existing dataset. This is what I have tried:
dataset = tf.data.Dataset.from_tensor_slices((imagepath, maskpath))
dataset = dataset.map(lambda imagepath, maskpath: tf.py_function(preprocessData,
inp=[imagepath, maskpath],
Tout=[tf.float64]*2))
datasetnew = tf.data.Dataset.from_tensor_slices(dataset)
Where the error I get is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_34/343231121.py in <module>
3 inp=[flairimg_val, msk_val],
4 Tout=[tf.float64]*2))
----> 5 datasetnew = tf.data.Dataset.from_tensor_slices(datasetval)
6 # datasetval = datasetval.map(lambda flairimg_val, msk_val, path: get_2p5D_repre(flairimg_val, msk_val, path))
7 # datasetval = datasetval.map(lambda flairimg_val, msk_val, path: try_return(flairimg_val, msk_val, path))
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/dataset_ops.py in from_tensor_slices(tensors)
758 Dataset: A `Dataset`.
759 """
--> 760 return TensorSliceDataset(tensors)
761
762 class _GeneratorState(object):
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, element)
3320 element = structure.normalize_element(element)
3321 batched_spec = structure.type_spec_from_value(element)
-> 3322 self._tensors = structure.to_batched_tensor_list(batched_spec, element)
3323 self._structure = nest.map_structure(
3324 lambda component_spec: component_spec._unbatch(), batched_spec) # pylint: disable=protected-access
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in to_batched_tensor_list(element_spec, element)
362 # pylint: disable=protected-access
363 # pylint: disable=g-long-lambda
--> 364 return _to_tensor_list_helper(
365 lambda state, spec, component: state + spec._to_batched_tensor_list(
366 component), element_spec, element)
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in _to_tensor_list_helper(encode_fn, element_spec, element)
337 return encode_fn(state, spec, component)
338
--> 339 return functools.reduce(
340 reduce_fn, zip(nest.flatten(element_spec), nest.flatten(element)), [])
341
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in reduce_fn(state, value)
335 def reduce_fn(state, value):
336 spec, component = value
--> 337 return encode_fn(state, spec, component)
338
339 return functools.reduce(
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/util/structure.py in <lambda>(state, spec, component)
363 # pylint: disable=g-long-lambda
364 return _to_tensor_list_helper(
--> 365 lambda state, spec, component: state + spec._to_batched_tensor_list(
366 component), element_spec, element)
367
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/dataset_ops.py in _to_batched_tensor_list(self, value)
3492 def _to_batched_tensor_list(self, value):
3493 if self._dataset_shape.ndims == 0:
-> 3494 raise ValueError("Unbatching a dataset is only supported for rank >= 1")
3495 return self._to_tensor_list(value)
3496
ValueError: Unbatching a dataset is only supported for rank >= 1
Not sure what the rank part means here for a dataset? How to achieve this?
答案1
得分: 1
你正在寻找 unbatch 方法
> 将数据集的元素拆分为多个元素。
>
> 例如,如果数据集的元素具有形状 [B, a0, a1, ...],
> 其中 B 可能因每个输入元素而异,则对于数据集中的每个元素,
> 未批处理的数据集将包含形状为 [a0, a1, ...] 的连续 B 个元素。
英文:
You're looking for the unbatch method
> Splits elements of a dataset into multiple elements.
>
> For example, if elements of the dataset are shaped [B, a0, a1, ...],
> where B may vary for each input element, then for each element in the
> dataset, the unbatched dataset will contain B consecutive elements of
> shape [a0, a1, ...].
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论