问题

I'd like a simple example to illustrate how gradient clipping via clip_grad_norm_ works. From this post, I found that if the norm of a gradient is greater than a threshold, then it simply takes the unit vector of the gradient and multiplies it with the threshold. That's what I tried

v = torch.rand(5)*1000
v_1 = v.clone()
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)
print(v, v_1)

(tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]),
 tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]))

I'd have thought it would do v/torch.norm(v, p=2) * 2 which should give me tensor([0.2480, 0.6083, 0.4324, 0.5463, 0.2881])

It doesn't seem to do anything. I thought the max_norm was the threshold value (the pytorch documentation wasn't very clear on this. This post wasn't too helpful either.

英文:

v = torch.rand(5)*1000
v_1 = v.clone()
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)
print(v, v_1)

(tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]),
 tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]))

I'd have thought it would do v/torch.norm(v, p=2) * 2 which should give me tensor([0.2480, 0.6083, 0.4324, 0.5463, 0.2881])

It doesn't seem to do anything. I thought the max_norm was the threshold value (the pytorch documentation wasn't very clear on this. This post wasn't too helpful either.

答案1

得分: 1

这是因为torch.nn.utils.clip_grad_norm剪裁梯度值（通过Tensor.grad访问），而不是值本身。以下是使用的快速示例：

v = torch.rand(5) * 1000
v_1 = v.clone()
v.requires_grad_(True)
v_1.requires_grad_(True)

loss = 1/2 * torch.sum(v_1 * v_1 + v * v)
# 这里损失相对于v和v_1的梯度应分别为v和v_1
loss.backward()

# 剪裁v_1的梯度
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)

print(v.grad)
print(v_1.grad)
print(v.grad / torch.norm(v.grad, p=2))

结果为：

tensor([486.8801, 481.7880, 172.6818, 659.4149,  62.8158])  # 未剪裁
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # 剪裁后！
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # 相同的值

英文:

It is because torch.nn.utils.clip_grad_norm clips the gradients values (accessed via Tensor.grad) and not the values themselves. Quick example of use:

v = torch.rand(5) * 1000
v_1 = v.clone()
v.requires_grad_(True)
v_1.requires_grad_(True)

loss = 1/2 * torch.sum(v_1 * v_1 + v * v)
# Here grads of loss w.r.t v and v_1 should be v and v_1 respectively
loss.backward()

# Clip grads of v_1
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)

print(v.grad)
print(v_1.grad)
print(v.grad / torch.norm(v.grad, p=2))

Results in:

tensor([486.8801, 481.7880, 172.6818, 659.4149,  62.8158])  # no clipped
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # clipped!
tensor([0.5028, 0.4975, 0.1783, 0.6809, 0.0649])  # same values

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pytorch clip_grad_norm_的工作示例

问题

答案1

What will be the gradient and weight of the particula part of the network if coeffiecient of one of the losses contributed by that network is zero?

你可以如何在PyTorch中使用优化器来更新模型的参数？

如何测试 JIT 编译的 Jax 函数是否创建新张量或视图？

Pytorch: 每轮接收相同的测试评估

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论