英文:
Autodiff implementation for gradient calculation
问题
以下是您提供的代码的中文翻译部分:
我已经阅读了一些关于自动微分算法的论文,以便自己实现它(用于学习目的)。我将我的算法与TensorFlow的输出进行了测试,并发现在大多数情况下它们不匹配。因此,我从这个链接中学习了教程,然后使用TensorFlow操作来实现它,仅针对矩阵乘法操作,因为这是其中一个不起作用的操作:
矩阵乘法和去广播方法的梯度:
def gradient_matmul(node, dx, adj):
a = node.parents[0]
b = node.parents[1]
if a == dx or b == dx:
mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
tf.matmul(tf.transpose(a.tensor), adj)
return mm
else:
return None
def unbroadcast(adjoint, node):
dim_a = len(adjoint.shape)
dim_b = len(node.shape)
if dim_a > dim_b:
sum = tuple(range(dim_a - dim_b))
res = tf.math.reduce_sum(adjoint, axis=sum)
return res
return adjoint
最后是梯度计算自动微分算法:
def gradient(y, dx):
working = [y]
adjoints = defaultdict(float)
adjoints[y] = tf.ones(y.tensor.shape)
while len(working) != 0:
curr = working.pop(0)
if curr == dx:
return adjoints[curr]
if curr.is_store:
continue
adj = adjoints[curr]
for p in curr.parents:
local_grad = gradient_matmul(curr, p, adj)
adjoints = unbroadcast(tf.add(adjoints
, local_grad), p.tensor)
if not p in working:
working.append(p)
然而,它产生了与我的初始实现相同的输出。我构建了一个矩阵乘法的测试案例:
x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]])
y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]])
w = tf.matmul(tf.matmul(x, y), z)
其中 w
应该针对每个变量进行求导。TensorFlow计算的梯度如下:
[<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-22., 18.],
[-22., 18.]],
[[ 32., -16.],
[ 32., -16.]]], dtype=float32]>, <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[66., -8.],
[80., -8.]], dtype=float32]>, <tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[ 5., 5.],
[ -1., -1.]],
[[ 18., 18.],
[-10., -10.]]], dtype=float32]>]
我的实现计算的是:
[[-5. 7.]
[-5. 7.]
[-5. 7.]
[-5. 7.]]
[33. 22.]
[54. 36.]
[[[ 9. 9.]
[ 14. 14.]
[ -5. -5.]
[-6. -6.]]
也许问题在于NumPy的dot
和TensorFlow的matmul
之间的差异?但然后我不知道如何修复TensorFlow方法的梯度或去广播...
感谢您花时间查看我的代码!
英文:
I have worked through some papers about the autodiff algorithm to implement it for myself (for learning purposes). I compared my algorithm in test cases to the output of tensorflow and their outputs did not match in most cases. Therefor i worked through the tutorial from this side and implemented it with tensorflow operations just for the matrix multiplication operation since that was one of the operations that did not work:
gradient of matmul and unbroadcast method:
def gradient_matmul(node, dx, adj):
# dx is needed to know which of both parents should be derived
a = node.parents[0]
b = node.parents[1]
# the operation was node.tensor = tf.matmul(a.tensor, b,tensor)
if a == dx or b == dx:
# result depends on which of the parents is the derivative
mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
tf.matmul(tf.transpose(a.tensor), adj)
return mm
else:
return None
def unbroadcast(adjoint, node):
dim_a = len(adjoint.shape)
dim_b = len(node.shape)
if dim_a > dim_b:
sum = tuple(range(dim_a - dim_b))
res = tf.math.reduce_sum(adjoint, axis = sum)
return res
return adjoint
And finally the gradient calculation autodiff algorithm:
def gradient(y, dx):
working = [y]
adjoints = defaultdict(float)
adjoints[y] = tf.ones(y.tensor.shape)
while len(working) != 0:
curr = working.pop(0)
if curr == dx:
return adjoints[curr]
if curr.is_store:
continue
adj = adjoints[curr]
for p in curr.parents:
# for testing with matrix multiplication as only operation
local_grad = gradient_matmul(curr, p, adj)
adjoints = unbroadcast(tf.add(adjoints
, local_grad), p.tensor)
if not p in working:
working.append(p)
Yet it produces the same output as my initial implementation.
I constructed a matrix multiplication test case:
x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]]])
y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]]])
w = tf.matmul(tf.matmul(x, y), z)
Where w
should be derived for each of the variables.
Tensorflow calculates the gradient:
[<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-22., 18.],
[-22., 18.]],
[[ 32., -16.],
[ 32., -16.]]], dtype=float32)>, <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[66., -8.],
[80., -8.]], dtype=float32)>, <tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[ 5., 5.],
[ -1., -1.]],
[[ 18., 18.],
[-10., -10.]]], dtype=float32)>]
My implementation calculates:
[[[-5. 7.]
[-5. 7.]]
[[-5. 7.]
[-5. 7.]]]
[[33. 22.]
[54. 36.]]
[[[ 9. 9.]
[14. 14.]]
[[-5. -5.]
[-6. -6.]]]
Maybe the problem is the difference between numpys dot
and tensorflows matmul
?
But then i don't know to fix the gradient or unbroadcast for the tensorflow method...
Thanks for taking the time to look over my code!
答案1
得分: 1
我找到了错误,梯度矩阵乘法应该是这样的:
def gradient_matmul(node, dx, adj):
a = node.parents[0]
b = node.parents[1]
if a == dx:
return tf.matmul(adj, b.tensor, transpose_b=True)
elif b == dx:
return tf.matmul(a.tensor, adj, transpose_a=True)
else:
return None
因为我只想转置最后两个维度。
英文:
I found the error, the gradient matmul should have been:
def gradient_matmul(node, dx, adj):
a = node.parents[0]
b = node.parents[1]
if a == dx:
return tf.matmul(adj, b.tensor, transpose_b=True)
elif b == dx:
return tf.matmul(a.tensor, adj, transpose_a=True)
else:
return None
Since i only want to transpose the last 2 dimensions
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论