英文:
TensorFlow: Why is the training of an RNN too slow on Apple Silicon M2?
问题
我在使用Apple Silicon M2训练我的LSTM模型时收到了"tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU."的警告。训练速度太慢了。我该如何充分利用这个芯片来完成我的任务?
PS:(1) 我已经安装了tensorflow-macos
和tensorflow-metal
包,以及Apple渠道的Conda中提供的tensorflow-deps
包。
(2) 我的模型也不是最深的,只包括一个具有64个单元的LSTM层和一个具有64个单元的密集层。
(3) 我的机器主要规格:
-
macOS v13.2.1(Ventura)(最新稳定版)
-
Apple Silicon M2(8核CPU、10核GPU和16核神经引擎)
-
16 GB统一内存
英文:
I am getting the "tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU." warning while training my LSTM model on Apple Silicon M2. The training is just too slow. How can I get the best out of this chip for my task?
PS: (1) I've already installed the tensorflow-macos
and tensorflow-metal
packages alongside the tensorflow-deps
package provided in the Apple channel of Conda.
(2) My model is not the deepest one either as it consists of one LSTM layer with 64 units, and one dense layer with 64 units.
(3) My machine's main specifications:
-
macOS v13.2.1 (Ventura) (the latest stable one)
-
Apple Silicon M2 (8-core CPU, 10-core GPU, and 16-core neural engine)
-
16 GB unified memory
答案1
得分: 2
由于您正在使用苹果芯片,cuDNN很可能不是问题的罪魁祸首。
尝试在CPU上进行训练并比较时间成本。您的模型不是很大,所以将工作分发到GPU的开销可能是主要原因。随着模型变得更大,开销往往会分摊。请参考此页面上的“故障排除”部分。
英文:
Since you're using Apple Silicon, cuDNN most likely isn't the culprit here.
Try training on the CPU and compare the time cost. Your model isn't large, so the overhead of dispatching work to the GPU should be the leading cause here. As your model gets larger, the overhead tends to get amortized. See the Troubleshooting section on this page.
答案2
得分: 1
同样的问题也出现在另一个基本的Keras模型中,该模型有2个LSTM层(https://keras.io/examples/vision/handwriting_recognition/)。我的Mac mini M2 Pro(tensorflow_metal-1.0.1,macOS 13.4.1)比一台10年前的iMac(在其3.5 GHz四核Intel Core i5 CPU上运行,macOS 10.15.7,Tensorflow 2.12)运行速度慢了两倍,这相当令人沮丧。幸运的是,禁用tensorflow_metal可以提高M2 Pro的性能,但仅比那台10年前的iMac快30%。看起来在可以完全信任tensorflow-metal实现之前还需要更多的工作...
英文:
Same issue here with another basic Keras model with 2 LSTM layers (https://keras.io/examples/vision/handwriting_recognition/). My Mac mini M2 Pro (tensorflow_metal-1.0.1, macOS 13.4.1) runs twice slower than a 10-year-old iMac (model’s training on its 3.5 GHz Quad-Core Intel Core i5 CPU, macOS 10.15.7, Tensorflow 2.12) which is quite pathetic. Fortunately disabling tensorflow_metal brings back some performance with the M2 Pro, but only 30% faster than the 10-year-old iMac. Looks like more work is needed on their tensorflow-metal implementation before it can be blindly trusted...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论