2023年6月1日 02:29:25go评论145阅读模式

英文:

Maximum 1X speedup achieved on Macbook Pro 2019 GPU for tensorflow operations

问题

以下是您提供的内容的翻译：

背景：

我想测试使用Macbook Pro 2019 GPU执行tensorflow操作可以实现的加速效果。

如建议所示，在下面的代码段中，我正在使用tensorflow库的内置函数tf.multiply()来获取张量与常数相乘的操作的输出：

import tensorflow as tf

tensor = tf.constant([[1, 2],
                      [3, 4]])

def cpu():
    with tf.device('/CPU:0'):
        tensor_1 = tf.multiply(tensor, 2)
        return tensor_1

def gpu():
    with tf.device('/device:GPU:0'):
        tensor_1 = tf.multiply(tensor, 2)
        return tensor_1


# 我们运行每个操作一次以进行预热；参见：https://stackoverflow.com/a/45067900
cpu()
gpu()

import time
n = 10000

start = time.time()
for i in range(n):
    cpu()
end = time.time()

cpu_time = end - start


start = time.time()
for i in range(n):
    gpu()
end = time.time()
gpu_time = end - start

print('GPU相对于CPU的加速倍数: {}x'.format((cpu_time / gpu_time)))

结果：

我只能实现最大1倍的加速。

我的问题：

理论上，如果tf.multiply()被优化以在GPU上运行，为什么我没有获得更好的加速，比如2倍到10倍？

系统和环境：

操作系统 - MacOS Ventura 13.3.1
处理器 - 2.6 GHz 6核 Intel Core i7
显卡 - AMD Radeon Pro 5300M 4 GB，Intel UHD Graphics 630 1536 MB
内存 - 16 GB 2667 MHz DDR4
tensorflow-macos python包版本 = 2.9.0
tensorflow-metal python包版本 = 0.6.0
Python版本 - 3.8.16

英文:

Context:

I wanted to test the speedup that we can achieve by using Macbook Pro 2019 GPU for tensorflow operations.

As advised, in the following snippet, I am using tensorflow library's in-built function tf.multiply() to fetch the output of multiplication operation of a tensor with a constant:

import tensorflow as tf

tensor = tf.constant([[1, 2],
                      [3, 4]])

def cpu():
    with tf.device(&#39;/CPU:0&#39;):
        tensor_1 = tf.multiply(tensor, 2)
        return tensor_1

def gpu():
    with tf.device(&#39;/device:GPU:0&#39;):
        tensor_1 = tf.multiply(tensor, 2)
        return tensor_1


# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

import time
n = 10000

start = time.time()
for i in range(n):
    cpu()
end = time.time()

cpu_time = end - start


start = time.time()
for i in range(n):
    gpu()
end = time.time()
gpu_time = end - start

print(&#39;GPU speedup over CPU: {}x&#39;.format((cpu_time / gpu_time)))

Results:

I could achieve a maximum speedup of 1X.

My Question:

Ideally, if the tf.multiply() is optimised to run on the GPU, why am I not getting a better speedup of lets say 2X to 10X?

System & environment:

MacOS Ventura 13.3.1
Processor - 2.6 GHz 6-Core Intel Core i7
Graphics - AMD Radeon Pro 5300M 4 GB, Intel UHD Graphics 630 1536 MB
Memory - 16 GB 2667 MHz DDR4
tensorflow-macos python package version = 2.9.0
tensorflow-metal python package version = 0.6.0
Python version - 3.8.16

答案1

得分: 1

特别感谢@JonSG和Dr. Matias Valdenegro-Toro（@Dr. Snoopy）对涉及高阶张量（或形状）的操作提出的评论。

我按照以下方式实现了张量乘法，并取得了8倍的加速效果：

import tensorflow as tf

tensor_1 = tf.random.uniform(shape=(400, 2700))
tensor_2 = tf.random.uniform(shape=(2700, 800))

def cpu():
    with tf.device('/CPU:0'):
        tensor_3 = tf.matmul(tensor_1, tensor_2)
        return tensor_3

def gpu():
    with tf.device('/device:GPU:0'):
        tensor_3 = tf.matmul(tensor_1, tensor_2)
        return tensor_3

# 我们先运行每个操作一次来预热; 参见: https://stackoverflow.com/a/45067900
cpu()
gpu()

import time
n = 1000

start = time.time()
for i in range(n):
    cpu()
end = time.time()

cpu_time = end - start

start = time.time()
for i in range(n):
    gpu()
end = time.time()
gpu_time = end - start

print('GPU相对于CPU的加速倍数: {}x'.format((cpu_time / gpu_time)))

输出:

GPU相对于CPU的加速倍数: 8.871644178654558x

英文:

Special thanks to @JonSG and Dr. Matias Valdenegro-Toro (@Dr. Snoopy) for their comments to try with the operations involving tensors of the higher orders (or shape).

I implemented the tensor multiplication as follows and could achieve 8X speedup:

import tensorflow as tf

tensor_1 = tf.random.uniform(shape=(400, 2700))
tensor_2 = tf.random.uniform(shape=(2700, 800))

def cpu():
    with tf.device(&#39;/CPU:0&#39;):
        tensor_3 = tf.matmul(tensor_1, tensor_2)
        return tensor_3

def gpu():
    with tf.device(&#39;/device:GPU:0&#39;):
        tensor_3 = tf.matmul(tensor_1, tensor_2)
        return tensor_3


# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

import time
n = 1000

start = time.time()
for i in range(n):
    cpu()
end = time.time()

cpu_time = end - start


start = time.time()
for i in range(n):
    gpu()
end = time.time()
gpu_time = end - start

print(&#39;GPU speedup over CPU: {}x&#39;.format((cpu_time / gpu_time)))

Output:

GPU speedup over CPU: 8.871644178654558x

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Macbook Pro 2019 GPU上，TensorFlow操作实现了最大1倍的加速。

问题

答案1

如何绘制具有以非零值开始的x轴数值的线图。

如何在Python中接收格式化的数字输入。

在CoreData中保存大量数据时出现问题。

Uncaught ReferenceError: selectors is not defined

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论