英文:
Maximum 1X speedup achieved on Macbook Pro 2019 GPU for tensorflow operations
问题
以下是您提供的内容的翻译:
背景:
我想测试使用Macbook Pro 2019 GPU执行tensorflow
操作可以实现的加速效果。
如建议所示,在下面的代码段中,我正在使用tensorflow
库的内置函数tf.multiply()
来获取张量与常数相乘的操作的输出:
import tensorflow as tf
tensor = tf.constant([[1, 2],
[3, 4]])
def cpu():
with tf.device('/CPU:0'):
tensor_1 = tf.multiply(tensor, 2)
return tensor_1
def gpu():
with tf.device('/device:GPU:0'):
tensor_1 = tf.multiply(tensor, 2)
return tensor_1
# 我们运行每个操作一次以进行预热;参见:https://stackoverflow.com/a/45067900
cpu()
gpu()
import time
n = 10000
start = time.time()
for i in range(n):
cpu()
end = time.time()
cpu_time = end - start
start = time.time()
for i in range(n):
gpu()
end = time.time()
gpu_time = end - start
print('GPU相对于CPU的加速倍数: {}x'.format((cpu_time / gpu_time)))
结果:
我只能实现最大1倍的加速。
我的问题:
理论上,如果tf.multiply()
被优化以在GPU上运行,为什么我没有获得更好的加速,比如2倍到10倍?
系统和环境:
- 操作系统 - MacOS Ventura 13.3.1
- 处理器 - 2.6 GHz 6核 Intel Core i7
- 显卡 - AMD Radeon Pro 5300M 4 GB,Intel UHD Graphics 630 1536 MB
- 内存 - 16 GB 2667 MHz DDR4
tensorflow-macos
python包版本 = 2.9.0tensorflow-metal
python包版本 = 0.6.0- Python版本 - 3.8.16
英文:
Context:
I wanted to test the speedup that we can achieve by using Macbook Pro 2019 GPU for tensorflow
operations.
As advised, in the following snippet, I am using tensorflow
library's in-built function tf.multiply()
to fetch the output of multiplication operation of a tensor with a constant:
import tensorflow as tf
tensor = tf.constant([[1, 2],
[3, 4]])
def cpu():
with tf.device('/CPU:0'):
tensor_1 = tf.multiply(tensor, 2)
return tensor_1
def gpu():
with tf.device('/device:GPU:0'):
tensor_1 = tf.multiply(tensor, 2)
return tensor_1
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()
import time
n = 10000
start = time.time()
for i in range(n):
cpu()
end = time.time()
cpu_time = end - start
start = time.time()
for i in range(n):
gpu()
end = time.time()
gpu_time = end - start
print('GPU speedup over CPU: {}x'.format((cpu_time / gpu_time)))
Results:
I could achieve a maximum speedup of 1X.
My Question:
Ideally, if the tf.multiply()
is optimised to run on the GPU, why am I not getting a better speedup of lets say 2X to 10X?
System & environment:
- MacOS Ventura 13.3.1
- Processor - 2.6 GHz 6-Core Intel Core i7
- Graphics - AMD Radeon Pro 5300M 4 GB, Intel UHD Graphics 630 1536 MB
- Memory - 16 GB 2667 MHz DDR4
tensorflow-macos
python package version = 2.9.0tensorflow-metal
python package version = 0.6.0- Python version - 3.8.16
答案1
得分: 1
特别感谢@JonSG和Dr. Matias Valdenegro-Toro(@Dr. Snoopy)对涉及高阶张量(或形状)的操作提出的评论。
我按照以下方式实现了张量乘法,并取得了8倍的加速效果:
import tensorflow as tf
tensor_1 = tf.random.uniform(shape=(400, 2700))
tensor_2 = tf.random.uniform(shape=(2700, 800))
def cpu():
with tf.device('/CPU:0'):
tensor_3 = tf.matmul(tensor_1, tensor_2)
return tensor_3
def gpu():
with tf.device('/device:GPU:0'):
tensor_3 = tf.matmul(tensor_1, tensor_2)
return tensor_3
# 我们先运行每个操作一次来预热; 参见: https://stackoverflow.com/a/45067900
cpu()
gpu()
import time
n = 1000
start = time.time()
for i in range(n):
cpu()
end = time.time()
cpu_time = end - start
start = time.time()
for i in range(n):
gpu()
end = time.time()
gpu_time = end - start
print('GPU相对于CPU的加速倍数: {}x'.format((cpu_time / gpu_time)))
输出:
GPU相对于CPU的加速倍数: 8.871644178654558x
英文:
Special thanks to @JonSG and Dr. Matias Valdenegro-Toro (@Dr. Snoopy) for their comments to try with the operations involving tensors of the higher orders (or shape).
I implemented the tensor multiplication as follows and could achieve 8X speedup:
import tensorflow as tf
tensor_1 = tf.random.uniform(shape=(400, 2700))
tensor_2 = tf.random.uniform(shape=(2700, 800))
def cpu():
with tf.device('/CPU:0'):
tensor_3 = tf.matmul(tensor_1, tensor_2)
return tensor_3
def gpu():
with tf.device('/device:GPU:0'):
tensor_3 = tf.matmul(tensor_1, tensor_2)
return tensor_3
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()
import time
n = 1000
start = time.time()
for i in range(n):
cpu()
end = time.time()
cpu_time = end - start
start = time.time()
for i in range(n):
gpu()
end = time.time()
gpu_time = end - start
print('GPU speedup over CPU: {}x'.format((cpu_time / gpu_time)))
Output:
GPU speedup over CPU: 8.871644178654558x
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论