英文:
Prefetch optimization of tf.data doesn't work
问题
I am working with the tf.data API and am analyzing the various speed-ups obtained with optimizations written here.
我正在使用tf.data API,并分析在这里编写的各种性能优化所获得的加速效果。
I'm working with the tf.data API and I'm analyzing the various speed-ups obtained with the optimizations written in, but in all cases what I've noticed is that you don't optimize performance using the prefetch option.
我正在使用tf.data API,并分析在其中编写的各种性能优化所获得的加速效果,但我注意到的是,在所有情况下,您似乎没有使用prefetch选项进行性能优化。
It almost seems that no optimization is implemented and therefore there is no overlapping between CPU and GPU.
几乎似乎没有实现任何优化,因此CPU和GPU之间没有重叠。
Currently I'm using TF 2.11.0 but I've used also TF 2.10.0 and TF 2.8.3 and the fact remains the same.
目前我正在使用TF 2.11.0,但我也使用过TF 2.10.0和TF 2.8.3,事实仍然如此。
I tried also with different batch size.
我还尝试了不同的批处理大小。
I have used also different PC with different GPUs but the fact remains the same.
我还尝试了不同的PC和不同的GPU,但事实仍然如此。
I worked with Cifar10 in which each image is RGB 32x32 and this training-set is made by 40.000 images.
我使用Cifar10,其中每个图像是RGB 32x32,这个训练集由40,000张图像组成。
A dummy code that I've used is this:
我使用的一个示例代码如下:
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model_data_augmentation_CPU():
"""Return the Keras model for data-augmentation on CPU"""
# Define Keras Model
model = tf.keras.Sequential([
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(10)
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip(mode='horizontal'),
tf.keras.layers.RandomRotation(0.1),
tf.keras.layers.RandomZoom(0.2),
])
model = get_model_data_augmentation_CPU()
BATCH_SIZE = 32
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
dataset_train = tf.data.Dataset.from_tensor_slices((X_train, y_train))
dataset_train = dataset_train.map(lambda x,y : (data_augmentation(x),y), num_parallel_calls=3)
dataset_train = dataset_train.batch(BATCH_SIZE)
dataset_train = dataset_train.prefetch(1) # If I comment this line the perfomance remain the same
start_time = time.time()
history = model.fit(
dataset_train,
epochs=EPOCHS,
)
end_time = time.time()
我使用的示例代码如下:
英文:
I am working with the tf.data API and am analyzing the various speed-ups obtained with optimizations written here.
I'm working with the tf.data API and I'm analyzing the various speed-ups obtained with the optimizations written in, but in all cases what I've noticed is that you don't optimize performance using the prefetch option.
It almost seems that no optimization is implemented and therefore there is no overlapping between CPU and GPU.
Currently I'm using TF 2.11.0 but I've used also TF 2.10.0 and TF 2.8.3 and the fact remains the same.
I tried also with different batch size.
I have used also different PC with different GPUs but the fact remains the same.
I worked with Cifar10 in which each image is RGB 32x32 and this training-set is made by 40.000 images.
A dummy code that I've used is this:
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model_data_augmentation_CPU():
"""Return the Keras model for data-augmentation on CPU"""
# Define Keras Model
model = tf.keras.Sequential([
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(10)
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip(mode='horizontal'),
tf.keras.layers.RandomRotation(0.1),
tf.keras.layers.RandomZoom(0.2),
])
model = get_model_data_augmentation_CPU()
BATCH_SIZE = 32
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
dataset_train = tf.data.Dataset.from_tensor_slices((X_train, y_train))
dataset_train = dataset_train.map(lambda x,y : (data_augmentation(x),y), num_parallel_calls=3)
dataset_train = dataset_train.batch(BATCH_SIZE)
dataset_train = dataset_train.prefetch(1) # If I comment this line the perfomance remain the same
start_time = time.time()
history = model.fit(
dataset_train,
epochs=EPOCHS,
)
end_time = time.time()
答案1
得分: 1
I created a notebook and tested on this dataset:
https://www.kaggle.com/competitions/dogs-vs-cats/data
I ran the notebook 3 times with prefetch and 3 times without prefetch and here are the results:
With prefetch, training time: 43.15s, 44.12s, 43.53s
Withoud prefetch, training time: 45.97s, 46.62s, 45.95s
That is basically 5.6% of improvement. Not too bad.
I think in your case there weren't much improvement because all your dataset are already in you ram and to process that didn't take much time, so there were basically no difference.
Here is the code (it's based on yours) if you want to try it out:
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model():
"""Return the Keras model for data-augmentation on CPU"""
# Define Keras Model
model = tf.keras.Sequential([
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(10, activation="softmax")
])
return model
from tensorflow.keras.utils import image_dataset_from_directory
data_dir = "catdogs"
train_dataset = keras.preprocessing.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(128,128),
batch_size=32,
subset="training",
shuffle=True,
validation_split=0.1,
)
val_dataset = keras.preprocessing.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(128,128),
batch_size=32,
subset="validation",
shuffle=False,
validation_split=0.1,
)
data_augmentation = tf.keras.Sequential([
layers.Rescaling(1./255),
layers.RandomRotation(0.2),
layers.RandomFlip(mode='horizontal'),
layers.RandomZoom(0.2),
])
model = keras.Sequential([
data_augmentation,
get_model()
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
start_time = time.time()
history = model.fit(
train_dataset,
epochs=2,
validation_data=val_dataset
)
end_time = time.time()
print(f"Total time: {end_time - start_time:.2f}")
英文:
I created a notebook and tested on this dataset:
https://www.kaggle.com/competitions/dogs-vs-cats/data
I ran the notebook 3 times with prefetch and 3 times without prefetch and here are the results:
With prefetch, training time: 43.15s, 44.12s, 43.53s
Withoud prefetch, training time: 45.97s, 46.62s, 45.95s
That is basically 5.6% of improvement. Not too bad.
I think in your case there weren't much improvement because all your dataset are already in you ram and to process that didn't take much time, so there were basically no difference.
Here is the code (it's based on yours) if you want to try it out:
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import time
def get_model():
"""Return the Keras model for data-augmentation on CPU"""
# Define Keras Model
model = tf.keras.Sequential([
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(10, activation="softmax")
])
return model
from tensorflow.keras.utils import image_dataset_from_directory
data_dir = "catdogs"
train_dataset = keras.preprocessing.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(128,128),
batch_size=32,
subset="training", #subset="both", lo nuevo
shuffle=True,
validation_split=0.1,
)
val_dataset = keras.preprocessing.image_dataset_from_directory(
data_dir,
seed=123,
image_size=(128,128),
batch_size=32,
subset="validation",
shuffle=False,
validation_split=0.1,
)
data_augmentation = tf.keras.Sequential([
layers.Rescaling(1./255),
layers.RandomRotation(0.2),
layers.RandomFlip(mode='horizontal'),
layers.RandomZoom(0.2),
])
model = keras.Sequential([
data_augmentation,
get_model()
])
adam_opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = adam_opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
start_time = time.time()
history = model.fit(
train_dataset,
epochs=2,
validation_data=val_dataset
)
end_time = time.time()
print(f"Total time: {end_time - start_time:.2f}")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论