问题

我在使用Apple Silicon M2训练我的LSTM模型时收到了"tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU."的警告。训练速度太慢了。我该如何充分利用这个芯片来完成我的任务？

PS：(1) 我已经安装了tensorflow-macos和tensorflow-metal包，以及Apple渠道的Conda中提供的tensorflow-deps包。

(2) 我的模型也不是最深的，只包括一个具有64个单元的LSTM层和一个具有64个单元的密集层。

(3) 我的机器主要规格：

macOS v13.2.1（Ventura）（最新稳定版）
Apple Silicon M2（8核CPU、10核GPU和16核神经引擎）
16 GB统一内存

英文:

I am getting the "tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU." warning while training my LSTM model on Apple Silicon M2. The training is just too slow. How can I get the best out of this chip for my task?

PS: (1) I've already installed the tensorflow-macos and tensorflow-metal packages alongside the tensorflow-deps package provided in the Apple channel of Conda.

(2) My model is not the deepest one either as it consists of one LSTM layer with 64 units, and one dense layer with 64 units.

(3) My machine's main specifications:

macOS v13.2.1 (Ventura) (the latest stable one)
Apple Silicon M2 (8-core CPU, 10-core GPU, and 16-core neural engine)
16 GB unified memory

答案1

得分: 2

由于您正在使用苹果芯片，cuDNN很可能不是问题的罪魁祸首。

尝试在CPU上进行训练并比较时间成本。您的模型不是很大，所以将工作分发到GPU的开销可能是主要原因。随着模型变得更大，开销往往会分摊。请参考此页面上的“故障排除”部分。

英文:

Since you're using Apple Silicon, cuDNN most likely isn't the culprit here.

Try training on the CPU and compare the time cost. Your model isn't large, so the overhead of dispatching work to the GPU should be the leading cause here. As your model gets larger, the overhead tends to get amortized. See the Troubleshooting section on this page.

答案2

得分: 1

同样的问题也出现在另一个基本的Keras模型中，该模型有2个LSTM层（https://keras.io/examples/vision/handwriting_recognition/）。我的Mac mini M2 Pro（tensorflow_metal-1.0.1，macOS 13.4.1）比一台10年前的iMac（在其3.5 GHz四核Intel Core i5 CPU上运行，macOS 10.15.7，Tensorflow 2.12）运行速度慢了两倍，这相当令人沮丧。幸运的是，禁用tensorflow_metal可以提高M2 Pro的性能，但仅比那台10年前的iMac快30%。看起来在可以完全信任tensorflow-metal实现之前还需要更多的工作...

英文:

Same issue here with another basic Keras model with 2 LSTM layers (https://keras.io/examples/vision/handwriting_recognition/). My Mac mini M2 Pro (tensorflow_metal-1.0.1, macOS 13.4.1) runs twice slower than a 10-year-old iMac (model’s training on its 3.5 GHz Quad-Core Intel Core i5 CPU, macOS 10.15.7, Tensorflow 2.12) which is quite pathetic. Fortunately disabling tensorflow_metal brings back some performance with the M2 Pro, but only 30% faster than the 10-year-old iMac. Looks like more work is needed on their tensorflow-metal implementation before it can be blindly trusted...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

TensorFlow: 为什么在Apple Silicon M2上训练RNN速度太慢？

问题

答案1

答案2

无法安装tensorflow_datasets。

Why does tensorflow.function (without jit_compile) speed up forward passes of a Keras model?

如何在Python中加载数据集并处理它，而不会超出内存限制？

TensorFlow Probability（tfp）中等价于np.quantile()的函数是：

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论