2023年4月7日 01:12:31go评论100阅读模式

英文:

Prefetch optimization of tf.data doesn't work

问题

I am working with the tf.data API and am analyzing the various speed-ups obtained with optimizations written here.
我正在使用tf.data API，并分析在这里编写的各种性能优化所获得的加速效果。

I'm working with the tf.data API and I'm analyzing the various speed-ups obtained with the optimizations written in, but in all cases what I've noticed is that you don't optimize performance using the prefetch option.
我正在使用tf.data API，并分析在其中编写的各种性能优化所获得的加速效果，但我注意到的是，在所有情况下，您似乎没有使用prefetch选项进行性能优化。

It almost seems that no optimization is implemented and therefore there is no overlapping between CPU and GPU.
几乎似乎没有实现任何优化，因此CPU和GPU之间没有重叠。

Currently I'm using TF 2.11.0 but I've used also TF 2.10.0 and TF 2.8.3 and the fact remains the same.
目前我正在使用TF 2.11.0，但我也使用过TF 2.10.0和TF 2.8.3，事实仍然如此。

I tried also with different batch size.
我还尝试了不同的批处理大小。

I have used also different PC with different GPUs but the fact remains the same.
我还尝试了不同的PC和不同的GPU，但事实仍然如此。

I worked with Cifar10 in which each image is RGB 32x32 and this training-set is made by 40.000 images.
我使用Cifar10，其中每个图像是RGB 32x32，这个训练集由40,000张图像组成。

A dummy code that I've used is this:
我使用的一个示例代码如下：

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model_data_augmentation_CPU():
    &quot;&quot;&quot;Return the Keras model for data-augmentation on CPU&quot;&quot;&quot;
    # Define Keras Model
    model = tf.keras.Sequential([
      layers.Conv2D(64, 3, activation=&#39;relu&#39;),
      layers.MaxPooling2D(),
      layers.Dropout(0.1),
      layers.Conv2D(128, 3, activation=&#39;relu&#39;),
      layers.MaxPooling2D(),
      layers.Dropout(0.1),
      layers.Conv2D(128, 3, activation=&#39;relu&#39;),
      layers.MaxPooling2D(),
      layers.Dropout(0.2),
      layers.Flatten(),
      layers.Dense(256, activation=&#39;relu&#39;),
      layers.Dropout(0.3),
      layers.Dense(10)
    ])
    adam_opt = keras.optimizers.Adam(learning_rate=0.001)
    model.compile(optimizer = adam_opt,
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=[&#39;accuracy&#39;])
    return model
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip(mode=&#39;horizontal&#39;),
    tf.keras.layers.RandomRotation(0.1),
    tf.keras.layers.RandomZoom(0.2),
])
model = get_model_data_augmentation_CPU()
BATCH_SIZE = 32 
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
dataset_train = tf.data.Dataset.from_tensor_slices((X_train, y_train)) 
dataset_train = dataset_train.map(lambda x,y : (data_augmentation(x),y), num_parallel_calls=3) 
dataset_train = dataset_train.batch(BATCH_SIZE)
dataset_train = dataset_train.prefetch(1) # If I comment this line the perfomance remain the same
start_time = time.time()
history = model.fit(
    dataset_train,
    epochs=EPOCHS,
)
end_time = time.time()

我使用的示例代码如下：

英文:

I am working with the tf.data API and am analyzing the various speed-ups obtained with optimizations written here.
I'm working with the tf.data API and I'm analyzing the various speed-ups obtained with the optimizations written in, but in all cases what I've noticed is that you don't optimize performance using the prefetch option.
It almost seems that no optimization is implemented and therefore there is no overlapping between CPU and GPU.
Currently I'm using TF 2.11.0 but I've used also TF 2.10.0 and TF 2.8.3 and the fact remains the same.
I tried also with different batch size.
I have used also different PC with different GPUs but the fact remains the same.
I worked with Cifar10 in which each image is RGB 32x32 and this training-set is made by 40.000 images.
A dummy code that I've used is this:

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model_data_augmentation_CPU():
&quot;&quot;&quot;Return the Keras model for data-augmentation on CPU&quot;&quot;&quot;
# Define Keras Model
model = tf.keras.Sequential([
layers.Conv2D(64, 3, activation=&#39;relu&#39;),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation=&#39;relu&#39;),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation=&#39;relu&#39;),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(256, activation=&#39;relu&#39;),
layers.Dropout(0.3),
layers.Dense(10)
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[&#39;accuracy&#39;])
return model
data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip(mode=&#39;horizontal&#39;),
tf.keras.layers.RandomRotation(0.1),
tf.keras.layers.RandomZoom(0.2),
])
model = get_model_data_augmentation_CPU()
BATCH_SIZE = 32 
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
dataset_train = tf.data.Dataset.from_tensor_slices((X_train, y_train)) 
dataset_train = dataset_train.map(lambda x,y : (data_augmentation(x),y), num_parallel_calls=3) 
dataset_train = dataset_train.batch(BATCH_SIZE)
dataset_train = dataset_train.prefetch(1) # If I comment this line the perfomance remain the same
start_time = time.time()
history = model.fit(
dataset_train,
epochs=EPOCHS,
)
end_time = time.time()

答案1

得分: 1

I created a notebook and tested on this dataset:
https://www.kaggle.com/competitions/dogs-vs-cats/data

I ran the notebook 3 times with prefetch and 3 times without prefetch and here are the results:
With prefetch, training time: 43.15s, 44.12s, 43.53s
Withoud prefetch, training time: 45.97s, 46.62s, 45.95s
That is basically 5.6% of improvement. Not too bad.

I think in your case there weren't much improvement because all your dataset are already in you ram and to process that didn't take much time, so there were basically no difference.

Here is the code (it's based on yours) if you want to try it out:

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model():
    """Return the Keras model for data-augmentation on CPU"""
    # Define Keras Model
    model = tf.keras.Sequential([
      layers.Conv2D(64, 3, activation='relu'),
      layers.MaxPooling2D(),
      layers.Dropout(0.1),
      layers.Conv2D(128, 3, activation='relu'),
      layers.MaxPooling2D(),
      layers.Dropout(0.1),
      layers.Conv2D(128, 3, activation='relu'),
      layers.MaxPooling2D(),
      layers.Dropout(0.2),
      layers.Flatten(),
      layers.Dense(256, activation='relu'),
      layers.Dropout(0.3),
      layers.Dense(10, activation="softmax")
    ])
    return model
from tensorflow.keras.utils import image_dataset_from_directory
data_dir = "catdogs"
train_dataset = keras.preprocessing.image_dataset_from_directory(
    data_dir,
    seed=123,
    image_size=(128,128),
    batch_size=32,
    subset="training",
    shuffle=True,
    validation_split=0.1,
) 
val_dataset = keras.preprocessing.image_dataset_from_directory(
    data_dir,
    seed=123,
    image_size=(128,128),
    batch_size=32,
    subset="validation",
    shuffle=False,
    validation_split=0.1,
) 
data_augmentation = tf.keras.Sequential([
    layers.Rescaling(1./255),
    layers.RandomRotation(0.2),
    layers.RandomFlip(mode='horizontal'),
    layers.RandomZoom(0.2),
])
model = keras.Sequential([
    data_augmentation,
    get_model()
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])
start_time = time.time()
history = model.fit(
    train_dataset,
    epochs=2,
    validation_data=val_dataset
)
end_time = time.time()
print(f"Total time: {end_time - start_time:.2f}")

英文:

I created a notebook and tested on this dataset:
https://www.kaggle.com/competitions/dogs-vs-cats/data

I think in your case there weren't much improvement because all your dataset are already in you ram and to process that didn't take much time, so there were basically no difference.

Here is the code (it's based on yours) if you want to try it out:

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model():
&quot;&quot;&quot;Return the Keras model for data-augmentation on CPU&quot;&quot;&quot;
# Define Keras Model
model = tf.keras.Sequential([
layers.Conv2D(64, 3, activation=&#39;relu&#39;),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation=&#39;relu&#39;),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation=&#39;relu&#39;),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(256, activation=&#39;relu&#39;),
layers.Dropout(0.3),
layers.Dense(10, activation=&quot;softmax&quot;)
])
return model
from tensorflow.keras.utils import image_dataset_from_directory
data_dir = &quot;catdogs&quot;
train_dataset = keras.preprocessing.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(128,128),
batch_size=32,
subset=&quot;training&quot;, #subset=&quot;both&quot;, lo nuevo
shuffle=True,
validation_split=0.1,
) 
val_dataset = keras.preprocessing.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(128,128),
batch_size=32,
subset=&quot;validation&quot;, 
shuffle=False,
validation_split=0.1,
) 
data_augmentation = tf.keras.Sequential([
layers.Rescaling(1./255),
layers.RandomRotation(0.2),
layers.RandomFlip(mode=&#39;horizontal&#39;),
layers.RandomZoom(0.2),
])
model = keras.Sequential([
data_augmentation,
get_model()
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[&#39;accuracy&#39;])
start_time = time.time()
history = model.fit(
train_dataset,
epochs=2,
validation_data=val_dataset
)
end_time = time.time()
print(f&quot;Total time: {end_time - start_time:.2f}&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Prefetch optimization of tf.data doesn’t work.

问题

答案1

在Python Pandas中检测Excel列的数据类型

为什么 Visual Studio Code 打开第二个 Python 终端并且破坏了第一个？

Poetry add pkg got "HTTPResponse.init() got an unexpected keyword argument 'strict'"

Scipy优化：限制非零变量数量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。