TensorFlow.data.Dataset.rejection_resample修改了我的数据集的element_spec。

huangapple go评论63阅读模式
英文:

Tensorflow.data.Dataset.rejection_resample modifies my dataset's element_spec

问题

我尝试使用tf.data.Dataset.rejection_resample来平衡我的数据集,但我遇到了一个问题,该方法会修改我的数据集的element_spec,使其与我的模型不兼容。

我的数据集的原始element_spec如下:

({'input_A': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None),
  'input_B': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None)},
 TensorSpec(shape=(None, 1, 1), dtype=tf.int64, name=None))

这是在批处理之后的element_spec

但是,如果我在批处理之前运行rejection_resample,最终的element_spec变成了:

(TensorSpec(shape=(None,), dtype=tf.int64, name=None),
 ({'input_A': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None),
   'input_B': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None)},
  TensorSpec(shape=(None, 1, 1), dtype=tf.int64, name=None))

因此,rejection_resample在我的数据的开头添加了另一个tf.int64张量,我无法弄清楚它的作用是什么。我的问题是,这会破坏输入数据和我的模型之间的兼容性,因为模型依赖于原始输入元组。

此外,这还会导致训练数据和验证数据之间的不一致性。我原本期望只在训练数据上应用rejection_resample,但如果我这样做,训练数据集将包含添加的张量,而验证数据集则不会包含。

因此,我的问题是,这个添加的张量是什么,是否有办法在构建数据集之后删除数据元素。谢谢。

英文:

I am trying to use tf.data.Dataset.rejection_resample to balance my dataset, but I am running into an issue in which the method modifies the element_spec of my dataset, making it incompatible with my models.

The original element spec of my dataset is:

({'input_A': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None),
  'input_B': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None)},
 TensorSpec(shape=(None, 1, 1), dtype=tf.int64, name=None))

This is the element spec after batching.

However, if I run rejection_resample (before batching), the element spec at the end becomes:

(TensorSpec(shape=(None,), dtype=tf.int64, name=None),
 ({'input_A': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None),
   'input_B': TensorSpec(shape=(None, 900, 1), dtype=tf.float64, name=None)},
  TensorSpec(shape=(None, 1, 1), dtype=tf.int64, name=None)))

So rejection_resample is adding another tf.int64 tensor in the beginning of my data, which I can't find out what is it for. My problem is that this breaks compatibility between the input data and my model, since it depends on the original input tuple.

Furthermore, it also causes an inconsistency between the training and validation data. I was expecting to apply rejection_resample only on training data, but if I do that, the training dataset will have the added tensor, while the validation one won't.

So my question is what is this added tensor to the element spec, and if there is any way to drop an element from the dataset after building it. Thank you.

答案1

得分: 1

我无法告诉你添加的张量来自何处但以下是如何从数据集中删除/丢弃它的示例

```python
import tensorflow as tf
import numpy as np
# 创建一个类似于您的“错误”输出的示例数据集
ds = tf.data.Dataset.from_tensor_slices((np.arange(-10, 0), (tf.constant(np.arange(10)), tf.constant(np.arange(10, 20))))
# 移除新的“错误”张量
dds = ds.map(lambda x, y: y)
# 检查新数据集
for i in dds.take(2):
    print(i)

请注意,这只是一种解决方法,不会移除导致添加张量的源。


<details>
<summary>英文:</summary>

I can&#39;t tell you where the added tensor comes from but here would be an example how to remove/drop it from your dataset:

import tensorflow as tf
import numpy as np

creating a sample dataset that's similar to your 'wrong' output

ds = tf.data.Dataset.from_tensor_slices((np.arange(-10, 0),(tf.constant(np.arange(10)), tf.constant(np.arange(10,20)))))

remove the new 'wrong' tensor

dds = ds.map(lambda x, y: y)

check new dataset

for i in dds.take(2):
print(i)

Keep in mind that this is a workaround and doesn&#39;t remove the source that causes the additional tensor

</details>



# 答案2
**得分**: 1

请看以下翻译:

```python
假设我创建了与您相同的数据集,

x = tf.random.normal((7000, 900, 1))
y = tf.random.normal((7000, 900, 1))
z = tf.random.uniform((7000, 1, 1), 1, 2, dtype=tf.int32)

# 现在将其转换为 Tf.Dataset 对象
dataset = tf.data.Dataset.from_tensor_slices(((x, y), z))

func = lambda x, y: ({'input_A': x[0], 'input_B': x[1]}, y)
dataset = dataset.map(func)

映射之后,我的 dataset 将与您的数据集完全相同:

<MapDataset element_spec=({'input_A': TensorSpec(shape=(900, 1), dtype=tf.float32, name=None), 'input_B': TensorSpec(shape=(900, 1), dtype=tf.float32, name=None)}, TensorSpec(shape=(1, 1), dtype=tf.int32, name=None)>

现在,我需要移除最后的 Tensor

disjoint_func = lambda x, y: (x)
dataset = dataset.map(disjoint_func)

现在,已经移除了额外的 Tensor

<MapDataset element_spec={'input_A': TensorSpec(shape=(900, 1), dtype=tf.float32, name=None), 'input_B': TensorSpec(shape=(900, 1), dtype=tf.float32, name=None)}>

希望这有助于您理解代码。

英文:

Let suppose I have created the same dataset as yours,

x = tf.random.normal((7000, 900,1))
y = tf.random.normal((7000, 900,1))
z = tf.random.uniform((7000, 1,1), 1, 2, dtype=tf.int32)

#Now converting it to Tf.Dataset object
dataset = tf.data.Dataset.from_tensor_slices(((x,y),z))

func = lambda x , y : (({&#39;input_A&#39; : x[0], &#39;input_B&#39; : x[1]}), y)
dataset = dataset.map(func)

After mapping my dataset will look exactly like yours

&lt;MapDataset element_spec=({&#39;input_A&#39;: TensorSpec(shape=(900, 1), dtype=tf.float32, name=None), &#39;input_B&#39;: TensorSpec(shape=(900, 1), dtype=tf.float32, name=None)}, TensorSpec(shape=(1, 1), dtype=tf.int32, name=None))&gt;

Now, I have to remove this last Tensor

disjoint_func = lambda x , y :(x)
dataset = dataset.map(disjoint_func)

Now, the extra Tensor has been removed

&lt;MapDataset element_spec={&#39;input_A&#39;: TensorSpec(shape=(900, 1), dtype=tf.float32, name=None), &#39;input_B&#39;: TensorSpec(shape=(900, 1), dtype=tf.float32, name=None)}&gt;

huangapple
  • 本文由 发表于 2023年2月6日 09:50:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356723.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定