2020年1月7日 01:17:37go评论174阅读模式

英文:

How to scale a gradient norm in Keras

问题

在MuZero的伪代码中，他们执行以下操作：

hidden_state = tf.scale_gradient(hidden_state, 0.5)

从这个问题中了解到，这很可能是梯度范数缩放。

在Keras中，如何对隐藏状态进行梯度范数缩放（将梯度范数剪裁到特定长度）？稍后他们还在损失值上进行了相同的缩放：

loss += tf.scale_gradient(l, gradient_scale)

这个网站说我应该在优化器上使用clipnorm参数。但我认为这不会起作用，因为我在使用优化器之前对梯度进行了缩放。（特别是因为我正在将不同的东西缩放到不同的长度。）

以下是论文中相关代码，如果有帮助的话。（请注意，scale_gradient不是实际的Tensorflow函数。如果你感到困惑，可以参考先前链接的问题。）

def update_weights(optimizer: tf.train.Optimizer, network: Network, batch,
                   weight_decay: float):
  loss = 0
  for image, actions, targets in batch:
    # Initial step, from the real observation.
    value, reward, policy_logits, hidden_state = network.initial_inference(
        image)
    predictions = [(1.0, value, reward, policy_logits)]
    # Recurrent steps, from action and previous hidden state.
    for action in actions:
      value, reward, policy_logits, hidden_state = network.recurrent_inference(
          hidden_state, action)
      predictions.append((1.0 / len(actions), value, reward, policy_logits))
      # 这一行在这里
      hidden_state = tf.scale_gradient(hidden_state, 0.5)
    for prediction, target in zip(predictions, targets):
      gradient_scale, value, reward, policy_logits = prediction
      target_value, target_reward, target_policy = target
      l = (
          scalar_loss(value, target_value) +
          scalar_loss(reward, target_reward) +
          tf.nn.softmax_cross_entropy_with_logits(
              logits=policy_logits, labels=target_policy))
      # 还有这里
      loss += tf.scale_gradient(l, gradient_scale)
  for weights in network.get_weights():
    loss += weight_decay * tf.nn.l2_loss(weights)
  optimizer.minimize(loss)

（请注意，这个问题与这个问题不同，后者是关于将梯度乘以一个值，而不是将梯度剪裁到特定的幅度。）

英文:

In the pseudocode for MuZero, they do the following:

hidden_state = tf.scale_gradient(hidden_state, 0.5)

From this question about what this means, I learned that this was likely a gradient norm scaling.

How can I do a gradient norm scaling (clipping the gradient norm to a particular length) on a hidden state in Keras? Later on they also do the same scaling on a loss value:

loss += tf.scale_gradient(l, gradient_scale)

This site says that I should use the clipnorm parameter on the optimizer. But I don't think that will work, because I'm scaling the gradients before using the optimizer. (And especially since I'm scaling different things to different lengths.)

Here is the particular code in question from the paper, in case it is helpful. (Note that scale_gradient is not an actual Tensorflow function. See the previously linked question if you are confused, as I was.)

def update_weights(optimizer: tf.train.Optimizer, network: Network, batch,
                   weight_decay: float):
  loss = 0
  for image, actions, targets in batch:
    # Initial step, from the real observation.
    value, reward, policy_logits, hidden_state = network.initial_inference(
        image)
    predictions = [(1.0, value, reward, policy_logits)]
    # Recurrent steps, from action and previous hidden state.
    for action in actions:
      value, reward, policy_logits, hidden_state = network.recurrent_inference(
          hidden_state, action)
      predictions.append((1.0 / len(actions), value, reward, policy_logits))
      # THIS LINE HERE
      hidden_state = tf.scale_gradient(hidden_state, 0.5)
    for prediction, target in zip(predictions, targets):
      gradient_scale, value, reward, policy_logits = prediction
      target_value, target_reward, target_policy = target
      l = (
          scalar_loss(value, target_value) +
          scalar_loss(reward, target_reward) +
          tf.nn.softmax_cross_entropy_with_logits(
              logits=policy_logits, labels=target_policy))
      # AND AGAIN HERE
      loss += tf.scale_gradient(l, gradient_scale)
  for weights in network.get_weights():
    loss += weight_decay * tf.nn.l2_loss(weights)
  optimizer.minimize(loss)

(Note that this question is different from this one which is asking about multiplying the gradient by a value, not clipping the gradient to a particular magnitude.)

答案1

得分: 1

你可以使用这里介绍的MaxNorm约束。

非常简单明了。导入它：from keras.constraints import MaxNorm

如果你想将其应用于权重，在定义Keras层时，可以使用kernel_constraint = MaxNorm(max_value=2, axis=0)（详细信息请阅读页面上的说明）。

你也可以使用bias_constraint = ...

如果你想将其应用于任何其他张量，只需使用张量调用它：

normalizer = MaxNorm(max_value=2, axis=0)
normalized_tensor = normalizer(original_tensor)

你可以查看源代码，它非常简单：

def __call__(self, w):
    norms = K.sqrt(K.sum(K.square(w), axis=self.axis, keepdims=True))
    desired = K.clip(norms, 0, self.max_value)
    return w * (desired / (K.epsilon() + norms))

英文:

You can use the MaxNorm constraint presented here.

It's very simple and straightforward. Import it from keras.constraints import MaxNorm

If you want to apply it to weights, when you define a Keras layer, you use kernel_constraint = MaxNorm(max_value=2, axis=0) (read the page for details on axis)

You can also use bias_constraint = ...

If you want to apply it to any other tensor, you can simply call it with a tensor:

normalizer = MaxNorm(max_value=2, axis=0)
normalized_tensor = normalizer(original_tensor)

And you can see the source code is pretty simple:

def __call__(self, w):
    norms = K.sqrt(K.sum(K.square(w), axis=self.axis, keepdims=True))
    desired = K.clip(norms, 0, self.max_value)
    return w * (desired / (K.epsilon() + norms))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Keras中缩放梯度范数

问题

答案1

如何在使用 sys.stdin.read() 和调用 subprocess 打开 vim 后避免终端混乱？

如何防止Python在发送电子邮件时泄漏文件路径？

Python正则表达式模式构建

获取所有组件的pca.explained_variance_ratio_，而不需要两次进行PCA。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。