2023年6月29日 16:02:31go评论115阅读模式

英文:

TensorFlow Keras: Input is empty. [[{{node decode_image/DecodeImage}}]] [[IteratorGetNext]] [Op:__inference_train_function_2877]

问题

I am using a modified version of the TensorFlow Image Classification tutorial found at this link. I will attach the code that I have at the bottom of the post.

我正在使用修改过的TensorFlow图像分类教程的版本，链接在这里。我将在帖子底部附上我的代码。

I am trying to use this model to classify images on a [much larger dataset][2] that has pictures of shapes. This dataset is ~23 times the size of the original one in the tutorial, which therefore takes much more computing power to train the model. In order to not hurt my poor, little laptop, I moved the job over to a Google Compute Engine Virtual Machine (8 cores, 32GB of RAM).

我试图使用这个模型对[一个规模更大的数据集][2]中的形状图片进行分类。这个数据集的大小约为教程中原始数据集的23倍，因此需要更多的计算资源来训练模型。为了不让我的可怜小笔记本受伤，我将任务转移到了一个Google Compute Engine虚拟机上（8核，32GB RAM）。

The model that I have attached below runs through all of the preliminary steps (importing the dataset, structuring the model, etc.). After all of these steps, it begins the training sequence. This seems like all is fine and well...

我附上的下面的模型经过了所有的预备步骤（导入数据集，构建模型等）。在所有这些步骤之后，它开始了训练过程。这一切似乎都很顺利...

However, after about 60-90% of the way through the first epoch, it throws the following exception:

然而，在第一个时期大约完成60-90%之后，它抛出了以下异常：

This is a strange error to me because there seems to be no issue starting the training process and there doesn't seem to be a set spot in the first epoch where the training errors out. One difference that I noted (and I believe I addressed) is that the image files are .png in this dataset compared to the .jpg in the original dataset.

对我来说，这是一个奇怪的错误，因为似乎没有问题开始训练过程，而且在第一个时期中似乎没有一个固定的地点会导致训练错误。我注意到的一个不同之处（我相信我已经处理了）是这个数据集中的图像文件是.png，而原始数据集中是.jpg。

------------------------------

As promised, the dataset file structure and code:

承诺的，下面是数据集的文件结构和代码：

Dataset File Structure:

|
|-new_2d_shapes
   |-Square
   |    |-Square_562aecd2-2a86-11ea-8123-8363a7ec19e6.png
   |    |-Square_a9df2a7c-2a96-11ea-8123-8363a7ec19e6.png
   |    |-....
   |-Triangle
   |     |-Triangle_5624fb26-2a89-11ea-8123-8363a7ec19e6.png
   |     |-Triangle_56dd1ee8-2a8d-11ee-8123-8363a7ec19e6.png
   |     |-....
   |-Pentagon
   |    |-Pentagon_aa06095a-2a85-11ea-8123-8363a7ec19e6.png
   |    |-Pentagon_a9fca126-2a94-11ea-8123-8363a7ec19e6.png
   |    |-....
   |-Hexagon
        |-Hexagon_ffff21c6-2a8e-11ea-8123-8363a7ec19e6.png
        |-Hexagon_a9eb022a-2a8c-11ea-8123-8363a7ec19e6.png
        |-....

Code:

(Notice that I commented out the portion of code responsible for configuring the dataset for performance because I thought that might be an issue. The visualization is also commented out because I am working over SSH connection)

代码：

（请注意，我已经注释掉了用于配置性能数据集的代码部分，因为我认为这可能是一个问题。可视化也被注释掉了，因为我是在SSH连接上工作的）

# %%
# Running all of the imported packages
import sklearn
import matplotlib.pyplot as plt
import numpy as np
import PIL
# Notice that this import takes a while
# This is amplified if using a virtual environment
print("Beginning to import tensorflow...")
import tensorflow as tf
print("tensorflow has been imported.")
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import pathlib
# %%
# Used for importing the dataset off of the web
# dataset_url = "https://data.mendeley.com/datasets/wzr2yv7r53/1"
# print("Stuck1")
# # Should print "data_dir: C:\Users\Garrett\.keras\datasets\flower_photos.tar"
# data_dir = tf.keras.utils.get_file('2D_geo_shape.tar', origin=dataset_url, extract=True)
# print("data_dir: {}".format(data_dir))
data_dir = "/MOUNT_HD1/gschindl/datasets/new_2d_shapes"
# Should print "data_dir: C:\Users\Garrett\.keras\datasets\flower_photos"
data_dir = pathlib.Path(data_dir).with_suffix('')
print("data_dir: {}".format(data_dir))
image_data = list(data_dir.glob('*/*.png'))
image_count = len(list(data_dir.glob('*/*.png')))
print("Number of images found: {}".format(image_count))
# %%
# Sets parameters for the loader
batch_size = 288
img_height = 180
img_width = 180
# %%
# Beginning the splitting and Finding the class names from the training set
# It's good practice to use a validation split when developing your model. 
# Use 80% of the images for training and 20% for validation.
print("Beginning the splitting and Finding the class names from the training set")
train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image
<details>
<summary>英文:</summary>
I am using a modified version of the TensorFlow Image Classification tutorial found at [this link][1]. I will attach the code that I have at the bottom of the post. 
I am trying to use this model to classify images on a [much larger dataset][2] that has pictures of shapes. This dataset is ~23 times the size of the original one in the tutorial, which therefore takes much more computing power to train the model. In order to not hurt my poor, little laptop, I moved the job over to a Google Compute Engine Virtual Machine (8 cores, 32GB of RAM). 
The model that I have attached below runs through all of the preliminary steps (importing the dataset, structuring the model, etc.). After all of these steps, it begins the training sequence. This seems like all is fine and well...

Epoch 1/20
200/304 [==================>...........] - ETA: 5:23 - loss: 2.1112 - accuracy: 0.1773


However, after about 60-90% of the way through the first epoch, it throws the following exception:

224/304 [=====================>........] - ETA: 4:09 - loss: 2.1010 - accuracy: 0.18202023-06-29 07:34:04.667705: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: Input is empty.
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]]
Traceback (most recent call last):
File "/MOUNT_HD1/gschindl/code/GeoShapeFull.py", line 215, in <module>
history = drop_model.fit(
File "/home/gschindl/.local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/gschindl/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Input is empty.
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]] [Op:__inference_train_function_2877]


This is a strange error to me because there seems to be no issue starting the training process and there doesn&#39;t seem to be a set spot in the first epoch where the training errors out. One difference that I noted (and I believe I addressed) is that the image files are `.png` in this dataset compared to the `.jpg` in the original dataset. 
**------------------------------**
As promised, the dataset file structure and code:
**Dataset File Structure:**


**Code:**
(Notice that I commented out the portion of code responsible for configuring the dataset for performance because I thought that might be an issue. The visualization is also commented out because I am working over SSH connection)

%%

Running all of the imported packages

import sklearn
import matplotlib.pyplot as plt
import numpy as np
import PIL

Notice that this import takes a while

This is amplified if using a virtual environment

print("Beginning to import tensorflow...")
import tensorflow as tf
print("tensorflow has been imported.")

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

import pathlib

%%

Used for importing the dataset off of the web

dataset_url = "https://data.mendeley.com/datasets/wzr2yv7r53/1"

print("Stuck1")

# Should print "data_dir: C:\Users\Garrett.keras\datasets\flower_photos.tar"

data_dir = tf.keras.utils.get_file('2D_geo_shape.tar', origin=dataset_url, extract=True)

print("data_dir: {}".format(data_dir))

data_dir = "/MOUNT_HD1/gschindl/datasets/new_2d_shapes"

Should print "data_dir: C:\Users\Garrett.keras\datasets\flower_photos"

data_dir = pathlib.Path(data_dir).with_suffix('')
print("data_dir: {}".format(data_dir))

image_data = list(data_dir.glob('/.png'))
image_count = len(list(data_dir.glob('/.png')))
print("Number of images found: {}".format(image_count))

%%

Sets parameters for the loader

batch_size = 288
img_height = 180
img_width = 180

%%

Beginning the splitting and Finding the class names from the training set

It's good practice to use a validation split when developing your model.

Use 80% of the images for training and 20% for validation.

print("Beginning the splitting and Finding the class names from the training set")

train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)

val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)

class_names = train_ds.class_names
print(class_names)

%%

Configuring the dataset for performance

#AUTOTUNE = tf.data.AUTOTUNE

#train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
#val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

#print("Configured.")

%%

Standardizing the data

print("\nStandardizing the data")

Changing the RGB range from [0, 255] to [0, 1] by using tf.keras.layers.Rescaling

normalization_layer = layers.Rescaling(1./255)

normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]

Notice the pixel values are now in `[0,1]`.

print("\n\nTHE NEW PIXEL VALUES",np.min(first_image), np.max(first_image))
print("Actual image: ", first_image)

%%

Creating the model

print("\nCreating the model")
num_classes = len(class_names)

model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])

print("\n\nCompleted the model creation process, onto compiling the model")

%%

Compiling the Model

model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

%%

Printing the model summary

model.summary()

%%

Data augmentation; "creating" more samples to train model on

print("\nBeginning the data augmentation task")

data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
]
)

%%

Visualizing the data augmentation

#plt.figure(figsize=(10, 10))
#for images, _ in train_ds.take(1):

for i in range(9):

augmented_images = data_augmentation(images)

ax = plt.subplot(3, 3, i + 1)

plt.imshow(augmented_images[0].numpy().astype("uint8"))

plt.axis("off")

%%

Adding in Dropout to a new model "drop_model"

print("\nAdding the dropout to the new 'drop_model' object")

drop_model = Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes, name="outputs")
])

%%

Compiling the drop_model network and training it

print("\nCompiling the drop_model network")

drop_model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

drop_model.summary()

print("\n\nBeginning the training on drop_model\n")
epochs = 20
history = drop_model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs,
steps_per_epoch = image_count // batch_size
)


[1]: https://www.tensorflow.org/tutorials/images/classification
[2]: https://data.mendeley.com/datasets/wzr2yv7r53/1
</details>
# 答案1
**得分**: 0
答案：代码中被注释掉的自动调优部分必须保持注释状态。如果不这样做，进程请求的内存会急剧增加。
我所做的两个更改：
1. 将所有照片从 `.png` 格式转换为 `jpg` 格式。我使用了 `mogrify` 软件包来实现这一点。有关这些文件转换的更多信息在 [这里][1]。
```shell
mogrify -format jpg *.png

第二个更改是删除.fit 设置的最后一行 — steps_per_epoch = image_count // batch_size。我发现当 image_count 不能被 batch_size 整除时，这会成为问题。您可以删除这行而不会造成任何损害，因为 .fit 会自动计算每个 epoch 应该执行的正确步数。

（我成功运行了第一个训练，准确率达到了 10%！！！）

英文:

ANSWER:
The Autotuning portion of the code that was commented out must stay commented out. If you don't, the memory that the process requests grows astronomically.

The two changes that I made:

Converting all of the photos from .png format to jpg format. I did this by using the mogrify package. More information about these file conversions is listed here.

  mogrify -format jpg *.png

The second item is removing the very last line of the .fit setup -- steps_per_epoch = image_count // batch_size. I saw that this was an issue when image_count was not divisible by batch_size. You can remove this line without any harm because the .fit automatically will calculate the correct amount of steps that should be taken per epoch.

(I was able to get it to run the full first training with an accuracy of 10%!!!)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题

%%