2023年2月7日 03:57:00go评论72阅读模式

英文:

Autodiff implementation for gradient calculation

问题

以下是您提供的代码的中文翻译部分：

我已经阅读了一些关于自动微分算法的论文，以便自己实现它（用于学习目的）。我将我的算法与TensorFlow的输出进行了测试，并发现在大多数情况下它们不匹配。因此，我从这个链接中学习了教程，然后使用TensorFlow操作来实现它，仅针对矩阵乘法操作，因为这是其中一个不起作用的操作：

矩阵乘法和去广播方法的梯度：

def gradient_matmul(node, dx, adj):
    a = node.parents[0]
    b = node.parents[1]
    if a == dx or b == dx:
        mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
                tf.matmul(tf.transpose(a.tensor), adj)
        return mm
    else: 
        return None
def unbroadcast(adjoint, node):
    dim_a = len(adjoint.shape)
    dim_b = len(node.shape)
    if dim_a > dim_b:
        sum = tuple(range(dim_a - dim_b))
        res = tf.math.reduce_sum(adjoint, axis=sum)
        return res
    return adjoint

最后是梯度计算自动微分算法：

def gradient(y, dx):
    working = [y]
    adjoints = defaultdict(float)
    adjoints[y] = tf.ones(y.tensor.shape)
    while len(working) != 0:
        curr = working.pop(0)
        if curr == dx:
            return adjoints[curr]
        if curr.is_store:
            continue
        adj = adjoints[curr]
        for p in curr.parents:
            local_grad = gradient_matmul(curr, p, adj)
            adjoints
 = unbroadcast(tf.add(adjoints
, local_grad), p.tensor)
            if not p in working:
                working.append(p)

然而，它产生了与我的初始实现相同的输出。我构建了一个矩阵乘法的测试案例：

x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]])
y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]])
w = tf.matmul(tf.matmul(x, y), z)

其中 w 应该针对每个变量进行求导。TensorFlow计算的梯度如下：

[<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-22.,  18.],
        [-22.,  18.]],
       [[ 32., -16.],
        [ 32., -16.]]], dtype=float32]>, <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[66., -8.],
       [80., -8.]], dtype=float32]>, <tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[  5.,   5.],
        [ -1.,  -1.]],
       [[ 18.,  18.],
        [-10., -10.]]], dtype=float32]>]

我的实现计算的是：

[[-5.  7.]
  [-5.  7.]
[-5.  7.]
  [-5.  7.]]
[33. 22.]
[54. 36.]
[[[  9.   9.]
  [ 14.  14.]
[ -5.  -5.]
  [-6.  -6.]]

也许问题在于NumPy的dot和TensorFlow的matmul之间的差异？但然后我不知道如何修复TensorFlow方法的梯度或去广播...
感谢您花时间查看我的代码！

英文:

I have worked through some papers about the autodiff algorithm to implement it for myself (for learning purposes). I compared my algorithm in test cases to the output of tensorflow and their outputs did not match in most cases. Therefor i worked through the tutorial from this side and implemented it with tensorflow operations just for the matrix multiplication operation since that was one of the operations that did not work:

gradient of matmul and unbroadcast method:

def gradient_matmul(node, dx, adj):
    # dx is needed to know which of both parents should be derived
    a = node.parents[0]
    b = node.parents[1]
    # the operation was node.tensor = tf.matmul(a.tensor, b,tensor)
    if a == dx or b == dx:
        # result depends on which of the parents is the derivative
        mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
                tf.matmul(tf.transpose(a.tensor), adj)
        return mm
    else: 
        return None
def unbroadcast(adjoint, node):
    dim_a = len(adjoint.shape)
    dim_b = len(node.shape)
    if dim_a &gt; dim_b:
        sum = tuple(range(dim_a - dim_b))
        res = tf.math.reduce_sum(adjoint, axis = sum)
        return res
    return adjoint

And finally the gradient calculation autodiff algorithm:

def gradient(y, dx):
    working = [y]
    adjoints = defaultdict(float)
    adjoints[y] = tf.ones(y.tensor.shape)
    while len(working) != 0:
        curr = working.pop(0)
        if curr == dx:
            return adjoints[curr]
        if curr.is_store:
            continue
        adj = adjoints[curr]
        for p in curr.parents:
            # for testing with matrix multiplication as only operation
            local_grad = gradient_matmul(curr, p, adj)
            adjoints
 = unbroadcast(tf.add(adjoints
, local_grad), p.tensor)
            if not p in working:
                working.append(p)

Yet it produces the same output as my initial implementation.
I constructed a matrix multiplication test case:

x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]]])
y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]]])
w = tf.matmul(tf.matmul(x, y), z)

Where w should be derived for each of the variables.
Tensorflow calculates the gradient:

[&lt;tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-22.,  18.],
        [-22.,  18.]],
       [[ 32., -16.],
        [ 32., -16.]]], dtype=float32)&gt;, &lt;tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[66., -8.],
       [80., -8.]], dtype=float32)&gt;, &lt;tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[  5.,   5.],
        [ -1.,  -1.]],
       [[ 18.,  18.],
        [-10., -10.]]], dtype=float32)&gt;]

My implementation calculates:

[[[-5.  7.]
  [-5.  7.]]
 [[-5.  7.]
  [-5.  7.]]]
[[33. 22.]
 [54. 36.]]
[[[ 9.  9.]
  [14. 14.]]
 [[-5. -5.]
  [-6. -6.]]]

Maybe the problem is the difference between numpys dot and tensorflows matmul?
But then i don't know to fix the gradient or unbroadcast for the tensorflow method...
Thanks for taking the time to look over my code!

答案1

得分: 1

我找到了错误，梯度矩阵乘法应该是这样的：

def gradient_matmul(node, dx, adj):
    a = node.parents[0]
    b = node.parents[1]
    if a == dx:
        return tf.matmul(adj, b.tensor, transpose_b=True)
    elif b == dx:
        return tf.matmul(a.tensor, adj, transpose_a=True)
    else:
        return None

因为我只想转置最后两个维度。

英文:

I found the error, the gradient matmul should have been:

def gradient_matmul(node, dx, adj):
    a = node.parents[0]
    b = node.parents[1]
    if a == dx:
        return tf.matmul(adj, b.tensor, transpose_b=True)
    elif b == dx:
        return tf.matmul(a.tensor, adj, transpose_a=True)
    else:
        return None

Since i only want to transpose the last 2 dimensions

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Autodiff实现用于梯度计算

问题

答案1

在OpenCV中将Roboflow模型推断限制为用户定义的区域

In tensorflow 1.15.5 / keras, how to use a tensor defined during building as a loss value for tape.gradient

线性回归的预测差距很大，我想知道应该改变什么。

TypeError: 元组索引必须是整数或切片，而不是列表 – 在加载Keras模型时

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。