Autodiff实现用于梯度计算

huangapple go评论72阅读模式
英文:

Autodiff implementation for gradient calculation

问题

以下是您提供的代码的中文翻译部分:

我已经阅读了一些关于自动微分算法的论文,以便自己实现它(用于学习目的)。我将我的算法与TensorFlow的输出进行了测试,并发现在大多数情况下它们不匹配。因此,我从这个链接中学习了教程,然后使用TensorFlow操作来实现它,仅针对矩阵乘法操作,因为这是其中一个不起作用的操作:

矩阵乘法和去广播方法的梯度:

  1. def gradient_matmul(node, dx, adj):
  2. a = node.parents[0]
  3. b = node.parents[1]
  4. if a == dx or b == dx:
  5. mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
  6. tf.matmul(tf.transpose(a.tensor), adj)
  7. return mm
  8. else:
  9. return None
  10. def unbroadcast(adjoint, node):
  11. dim_a = len(adjoint.shape)
  12. dim_b = len(node.shape)
  13. if dim_a > dim_b:
  14. sum = tuple(range(dim_a - dim_b))
  15. res = tf.math.reduce_sum(adjoint, axis=sum)
  16. return res
  17. return adjoint

最后是梯度计算自动微分算法:

  1. def gradient(y, dx):
  2. working = [y]
  3. adjoints = defaultdict(float)
  4. adjoints[y] = tf.ones(y.tensor.shape)
  5. while len(working) != 0:
  6. curr = working.pop(0)
  7. if curr == dx:
  8. return adjoints[curr]
  9. if curr.is_store:
  10. continue
  11. adj = adjoints[curr]
  12. for p in curr.parents:
  13. local_grad = gradient_matmul(curr, p, adj)
  14. adjoints

    = unbroadcast(tf.add(adjoints

    , local_grad), p.tensor)

  15. if not p in working:

  16. working.append(p)

然而,它产生了与我的初始实现相同的输出。我构建了一个矩阵乘法的测试案例:

  1. x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]])
  2. y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
  3. z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]])
  4. w = tf.matmul(tf.matmul(x, y), z)

其中 w 应该针对每个变量进行求导。TensorFlow计算的梯度如下:

  1. [<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
  2. array([[[-22., 18.],
  3. [-22., 18.]],
  4. [[ 32., -16.],
  5. [ 32., -16.]]], dtype=float32]>, <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
  6. array([[66., -8.],
  7. [80., -8.]], dtype=float32]>, <tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
  8. array([[[ 5., 5.],
  9. [ -1., -1.]],
  10. [[ 18., 18.],
  11. [-10., -10.]]], dtype=float32]>]

我的实现计算的是:

  1. [[-5. 7.]
  2. [-5. 7.]
  3. [-5. 7.]
  4. [-5. 7.]]
  5. [33. 22.]
  6. [54. 36.]
  7. [[[ 9. 9.]
  8. [ 14. 14.]
  9. [ -5. -5.]
  10. [-6. -6.]]

也许问题在于NumPy的dot和TensorFlow的matmul之间的差异?但然后我不知道如何修复TensorFlow方法的梯度或去广播...
感谢您花时间查看我的代码! Autodiff实现用于梯度计算

英文:

I have worked through some papers about the autodiff algorithm to implement it for myself (for learning purposes). I compared my algorithm in test cases to the output of tensorflow and their outputs did not match in most cases. Therefor i worked through the tutorial from this side and implemented it with tensorflow operations just for the matrix multiplication operation since that was one of the operations that did not work:

gradient of matmul and unbroadcast method:

  1. def gradient_matmul(node, dx, adj):
  2. # dx is needed to know which of both parents should be derived
  3. a = node.parents[0]
  4. b = node.parents[1]
  5. # the operation was node.tensor = tf.matmul(a.tensor, b,tensor)
  6. if a == dx or b == dx:
  7. # result depends on which of the parents is the derivative
  8. mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
  9. tf.matmul(tf.transpose(a.tensor), adj)
  10. return mm
  11. else:
  12. return None
  13. def unbroadcast(adjoint, node):
  14. dim_a = len(adjoint.shape)
  15. dim_b = len(node.shape)
  16. if dim_a &gt; dim_b:
  17. sum = tuple(range(dim_a - dim_b))
  18. res = tf.math.reduce_sum(adjoint, axis = sum)
  19. return res
  20. return adjoint

And finally the gradient calculation autodiff algorithm:

  1. def gradient(y, dx):
  2. working = [y]
  3. adjoints = defaultdict(float)
  4. adjoints[y] = tf.ones(y.tensor.shape)
  5. while len(working) != 0:
  6. curr = working.pop(0)
  7. if curr == dx:
  8. return adjoints[curr]
  9. if curr.is_store:
  10. continue
  11. adj = adjoints[curr]
  12. for p in curr.parents:
  13. # for testing with matrix multiplication as only operation
  14. local_grad = gradient_matmul(curr, p, adj)
  15. adjoints

    = unbroadcast(tf.add(adjoints

    , local_grad), p.tensor)

  16. if not p in working:

  17. working.append(p)

Yet it produces the same output as my initial implementation.
I constructed a matrix multiplication test case:

  1. x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]]])
  2. y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
  3. z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]]])
  4. w = tf.matmul(tf.matmul(x, y), z)

Where w should be derived for each of the variables.
Tensorflow calculates the gradient:

  1. [&lt;tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
  2. array([[[-22., 18.],
  3. [-22., 18.]],
  4. [[ 32., -16.],
  5. [ 32., -16.]]], dtype=float32)&gt;, &lt;tf.Tensor: shape=(2, 2), dtype=float32, numpy=
  6. array([[66., -8.],
  7. [80., -8.]], dtype=float32)&gt;, &lt;tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
  8. array([[[ 5., 5.],
  9. [ -1., -1.]],
  10. [[ 18., 18.],
  11. [-10., -10.]]], dtype=float32)&gt;]

My implementation calculates:

  1. [[[-5. 7.]
  2. [-5. 7.]]
  3. [[-5. 7.]
  4. [-5. 7.]]]
  5. [[33. 22.]
  6. [54. 36.]]
  7. [[[ 9. 9.]
  8. [14. 14.]]
  9. [[-5. -5.]
  10. [-6. -6.]]]

Maybe the problem is the difference between numpys dot and tensorflows matmul?
But then i don't know to fix the gradient or unbroadcast for the tensorflow method...
Thanks for taking the time to look over my code! Autodiff实现用于梯度计算

答案1

得分: 1

我找到了错误,梯度矩阵乘法应该是这样的:

  1. def gradient_matmul(node, dx, adj):
  2. a = node.parents[0]
  3. b = node.parents[1]
  4. if a == dx:
  5. return tf.matmul(adj, b.tensor, transpose_b=True)
  6. elif b == dx:
  7. return tf.matmul(a.tensor, adj, transpose_a=True)
  8. else:
  9. return None

因为我只想转置最后两个维度。

英文:

I found the error, the gradient matmul should have been:

  1. def gradient_matmul(node, dx, adj):
  2. a = node.parents[0]
  3. b = node.parents[1]
  4. if a == dx:
  5. return tf.matmul(adj, b.tensor, transpose_b=True)
  6. elif b == dx:
  7. return tf.matmul(a.tensor, adj, transpose_a=True)
  8. else:
  9. return None

Since i only want to transpose the last 2 dimensions

huangapple
  • 本文由 发表于 2023年2月7日 03:57:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365980.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定