
huangapple go评论59阅读模式

Tensorboard histogram onehot operation causing ResourceExhauseError: OOM


I'm trying to train a VGG16 model. I'm using a sample dataset of 4000 300x300 images in 14 classes, and running my code on a Google VM using an Nvidia L4 GPU with 20gb of memory. I am running python 3.7, tf version 2.11, and cuda version 12.1. My data is stored in GCS.


tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)


If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 

ResourceExhaustedError: {{function_node __wrapped__OneHot_device_/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[102760448,30] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:OneHot]


ResourceExhaustedError                    Traceback (most recent call last)
/var/tmp/ipykernel_5723/1753739100.py in <module>
      1 # Fit model
----> 2 history = model.fit(train_ds, validation_data=val_ds, epochs=5, callbacks=[tensorboard_callback])

/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
--> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in histogram(name, data, step, buckets, description)
    198             tensor=lazy_tensor,
    199             step=step,
--> 200             metadata=summary_metadata,
    201         )

/opt/conda/lib/python3.7/site-packages/tensorboard/util/lazy_tensor_creator.py in __call__(self)
     64                 elif self._tensor is None:
     65                     self._tensor = _CALL_IN_PROGRESS_SENTINEL
--> 66                     self._tensor = self._tensor_callable()
     67         return self._tensor

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in lazy_tensor()
    192         @lazy_tensor_creator.LazyTensorCreator
    193         def lazy_tensor():
--> 194             return _buckets(data, buckets)
    196         return tf.summary.write(

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in _buckets(data, bucket_count)
    291             )
--> 293         return tf.cond(is_empty, when_empty, when_nonempty)

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in when_nonempty()
    289             return tf.cond(
--> 290                 has_single_value, when_single_value, when_multiple_values
    291             )

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in when_multiple_values()
    257                 # See https://github.com/tensorflow/tensorflow/issues/51419 for details.
    258                 one_hots = tf.one_hot(
--> 259                     clamped_indices, depth=bucket_count, dtype=tf.float64
    260                 )
    261                 bucket_counts = tf.cast( 

ResourceExhaustedError: {{function_node __wrapped__OneHot_device_/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[102760448,30] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:OneHot]```




 I&#39;m trying to train a VGG16 model. I&#39;m using a sample dataset of 4000 300x300 images in 14 classes, and running my code on a Google VM using an Nvidia L4 GPU with 20gb of memory. I am running python 3.7, tf version 2.11, and cuda version 12.1. My data is stored in GCS.

When I run the model with the following TensorBoard callback: 

```tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)```

I get this error at the end of the first epoch:

2023-06-14 19:51:21.248476: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (mklcpu) ran out of memory trying to allocate 22.97GiB (rounded to 24662507520)requested by op OneHot
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.

`ResourceExhaustedError: {{function_node _`_`wrapped__OneHot_device`_`/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[102760448,30] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:OneHot]`

The error traces back to the tensorboard histogram object:

ResourceExhaustedError Traceback (most recent call last)
/var/tmp/ipykernel_5723/1753739100.py in <module>
1 # Fit model
----> 2 history = model.fit(train_ds, validation_data=val_ds, epochs=5, callbacks=[tensorboard_callback])

/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in histogram(name, data, step, buckets, description)
198 tensor=lazy_tensor,
199 step=step,
--> 200 metadata=summary_metadata,
201 )

/opt/conda/lib/python3.7/site-packages/tensorboard/util/lazy_tensor_creator.py in call(self)
64 elif self._tensor is None:
65 self._tensor = _CALL_IN_PROGRESS_SENTINEL
---> 66 self._tensor = self._tensor_callable()
67 return self._tensor

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in lazy_tensor()
192 @lazy_tensor_creator.LazyTensorCreator
193 def lazy_tensor():
--> 194 return _buckets(data, buckets)
196 return tf.summary.write(

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in _buckets(data, bucket_count)
291 )
--> 293 return tf.cond(is_empty, when_empty, when_nonempty)

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in when_nonempty()
289 return tf.cond(
--> 290 has_single_value, when_single_value, when_multiple_values
291 )

/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/histogram/summary_v2.py in when_multiple_values()
257 # See https://github.com/tensorflow/tensorflow/issues/51419 for details.
258 one_hots = tf.one_hot(
--> 259 clamped_indices, depth=bucket_count, dtype=tf.float64
260 )
261 bucket_counts = tf.cast(

ResourceExhaustedError: {{function_node _wrapped__OneHot_device/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[102760448,30] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:OneHot]

Interestingly it seems to be calling tf.one_hot and blowing up the gpu memory with a massive tensor regardless of whether I train the model with integer labels and spare categorical cross entropy or if I train it with one hot labels and cross entropy. I don&#39;t really understand what the tensor contains because its dimensions neither relate to the number of training examples or classes that I am using.

Any ideas about how to fix this?


# 答案1
**得分**: 1





import os
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'

The issue seems related to Memory resources and not problem with Tensorflow. If using one hot encoding it creates a very large sparse tensor which may require higher memory resources. As you have set histogram_freq=1 it will create additional computations for weight histograms of each layer which needs higher memory resources.

You may try setting histogram_freq=0 and check if the problem still exists then we need to check your code which is causing the large tensor computations.If no problem then its clear case of Higher memory requirement due to Histogram computations .

OOM errors are depends upon the input sizes and the Memory resources. Tensorflow can't have control on this. It has to be taken care by the users.

May be you can reduce batch_size in model.fit which is by default 32(you may try batch_size=16) . Also use the below code before importing Tensorflow in your code block.This may help if the OOM is due to Memory fragmentation.

import os

os.environ[‘TF_GPU_ALLOCATOR’] = ‘cuda_malloc_async’

  • 本文由 发表于 2023年6月15日 04:53:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477464.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
