pytorch clip_grad_norm_的工作示例

huangapple go评论104阅读模式
英文:

An example of how pytorch clip_grad_norm_ works

问题

I'd like a simple example to illustrate how gradient clipping via clip_grad_norm_ works. From this post, I found that if the norm of a gradient is greater than a threshold, then it simply takes the unit vector of the gradient and multiplies it with the threshold. That's what I tried

v = torch.rand(5)*1000
v_1 = v.clone()
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)
print(v, v_1)

(tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]),
 tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]))

I'd have thought it would do v/torch.norm(v, p=2) * 2 which should give me tensor([0.2480, 0.6083, 0.4324, 0.5463, 0.2881])

It doesn't seem to do anything. I thought the max_norm was the threshold value (the pytorch documentation wasn't very clear on this. This post wasn't too helpful either.

英文:

I'd like a simple example to illustrate how gradient clipping via clip_grad_norm_ works. From this post, I found that if the norm of a gradient is greater than a threshold, then it simply takes the unit vector of the gradient and multiplies it with with threshold. That's what I tried

v = torch.rand(5)*1000
v_1 = v.clone()
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)
print(v, v_1)

(tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]),
 tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]))

I'd have thought it would do v/torch.norm(v, p=2) * 2 which should give me tensor([0.2480, 0.6083, 0.4324, 0.5463, 0.2881])

It doesn't seem to do anything. I thought the max_norm was the threshold value (the pytorch documentation wasn't very clear on this. This post wasn't too helpful either.

答案1

得分: 1

这是因为torch.nn.utils.clip_grad_norm剪裁梯度值(通过Tensor.grad访问),而不是值本身。以下是使用的快速示例:

v = torch.rand(5) * 1000
v_1 = v.clone()
v.requires_grad_(True)
v_1.requires_grad_(True)

loss = 1/2 * torch.sum(v_1 * v_1 + v * v)
# 这里损失相对于v和v_1的梯度应分别为v和v_1
loss.backward()

# 剪裁v_1的梯度
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)

print(v.grad)
print(v_1.grad)
print(v.grad / torch.norm(v.grad, p=2))

结果为:

tensor([486.8801, 481.7880, 172.6818, 659.4149,  62.8158])  # 未剪裁
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # 剪裁后!
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # 相同的值
英文:

It is because torch.nn.utils.clip_grad_norm clips the gradients values (accessed via Tensor.grad) and not the values themselves. Quick example of use:

v = torch.rand(5) * 1000
v_1 = v.clone()
v.requires_grad_(True)
v_1.requires_grad_(True)

loss = 1/2 * torch.sum(v_1 * v_1 + v * v)
# Here grads of loss w.r.t v and v_1 should be v and v_1 respectively
loss.backward()

# Clip grads of v_1
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)

print(v.grad)
print(v_1.grad)
print(v.grad / torch.norm(v.grad, p=2))

Results in:

tensor([486.8801, 481.7880, 172.6818, 659.4149,  62.8158])  # no clipped
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # clipped!
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # same values

huangapple
  • 本文由 发表于 2023年8月5日 02:04:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838254.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定