ValueError: 输出操作数的形状 (1,64) 与广播形状 (2,64) 不匹配。

huangapple go评论67阅读模式
英文:

ValueError: non-broadcastable output operand with shape (1,64) doesn't match the broadcast shape (2,64)

问题

抱歉,我只能提供有关翻译的服务,不提供代码或错误排查。如果您需要关于代码错误的帮助,请提供更多信息以便其他人能够帮助您诊断问题。

英文:

I am following the instructions to make a neural network from the book "Neural Networks from Scratch in Python" and I have gotten to the SGD optimizer with learning rate decay, but when I try to add momentum to it, I get this error:

Traceback (most recent call last):
  File "nn.py", line 142, in <module>
    optimizer.update_params(dense1)
  File "nn.py", line 110, in update_params
    layer.biases += bias_updates
ValueError: non-broadcastable output operand with shape (1,64) doesn't match the broadcast shape (2,64)

For me this seems to be an error with NumPy, but I barely have experience with it and I don't understand this error well.

Here is my code:


import numpy as np
from nnfs.datasets import spiral_data
import nnfs
import pickle

nnfs.init()

class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
    
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases
        self.inputs = inputs
    
    def backward(self, dvalues):
        self.dweights = np.dot(self.inputs.T, dvalues)
        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)
        self.dinputs = np.dot(dvalues, self.weights.T)
        
class Activation_ReLU:
        def forward(self, inputs):
            self.output = np.maximum(0, inputs)
            self.inputs = inputs
        def backward(self, dvalues):
            self.dinputs = dvalues.copy()
            self.dinputs[self.inputs <= 0] = 0

class Activation_Softmax:
    def forward(self, inputs):
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities
    def backward(self, dvalues):
        self.dinputs = np.empty_like(dvalues)
        for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
            single_output = single_output.reshape(-1, 1)
            jacobian_matrix = np.diagflat(single_output) - np.dot(single_output,single_output.T)
            self.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)

class Loss:
    def calculate(self, output, y):
        sample_losses = self.forward(output, y)
        data_loss = np.mean(sample_losses)
        return data_loss

class Loss_CategoricalCrossEntropy(Loss):
    def forward(self, y_pred, y_true):
        samples = len(y_pred)
        y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clipped * y_true,
                axis=1
            )
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods
    def backward(self, dvalues, y_true):
        samples = len(dvalues)
        labels = len(dvalues[0])
        if len(y_true.shape) == 1:
            y_true = np.eye(labels)[y_true]
        self.dinputs = -y_true / dvalues
        self.dinputs = self.dinputs / samples
class Activation_Softmax_Loss_CategoricalCrossEntropy():
    def __init__(self):
        self.activation = Activation_Softmax()
        self.loss = Loss_CategoricalCrossEntropy()
    def forward(self, inputs, y_true):
        self.activation.forward(inputs)
        self.output = self.activation.output
        return self.loss.calculate(self.output, y_true)
    def backward(self, dvalues, y_true):
        samples = len(dvalues)
        if len(y_true.shape) == 2:
            y_true = np.argmax(y_true, axis=1)
        self.dinputs = dvalues.copy()
        self.dinputs[range(samples), y_true] -= 1
        self.dinputs = self.dinputs / samples
class Optimizer_SGD:
    def __init__(self, learning_rate=1., decay=0., momentum=0.):
        self.learning_rate = learning_rate
        self.current_learning_rate = learning_rate
        self.decay = decay
        self.iterations = 0
        self.momentum = momentum

    def pre_update_params(self):
        if self.decay:
            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
    
    def update_params(self, layer):
        if self.momentum:
            if not hasattr(layer, 'weight_momentums'):
                layer.weight_momentums = np.zeros_like(layer.weights)
                layer.bias_momentums = np.zeros_like(layer.biases)
            weight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweights
            layer.weight_momentums = weight_updates
            bias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dweights
            layer.bias_momentums = bias_updates
        else:
            weight_updates = -self.current_learning_rate * layer.dweights
            bias_updates = -self.current_learning_rate * layer.dbiases
        layer.weights += weight_updates
        layer.biases += bias_updates
    def post_update_params(self):
        self.iterations += 1

X, y = spiral_data(samples=100, classes=3)

optimizer = Optimizer_SGD(decay=1e-3, momentum=0.5)
dense1 = Layer_Dense(2, 64)
activation1 = Activation_ReLU()
dense2 = Layer_Dense(64, 3)
loss_activation = Activation_Softmax_Loss_CategoricalCrossEntropy()

for epoch in range(10001):
    dense1.forward(X)
    activation1.forward(dense1.output)
    dense2.forward(activation1.output)
    loss = loss_activation.forward(dense2.output, y)
    predictions = np.argmax(loss_activation.output, axis=1)
    
    if len(y.shape) == 2:
        y = np.argmax(y, axis=1)
    accuracy = np.mean(predictions==y)
    
    if not epoch % 100:
        print(f'epoch: {epoch}, acc: {accuracy:.3f}, loss: {loss:.3f}, lr: {optimizer.current_learning_rate}')
        
    loss_activation.backward(loss_activation.output, y)
    dense2.backward(loss_activation.dinputs)
    activation1.backward(dense2.dinputs)
    dense1.backward(activation1.dinputs)
    
    optimizer.pre_update_params()
    optimizer.update_params(dense1)
    optimizer.update_params(dense2)
    optimizer.post_update_params()


    

stream = [dense1.weights, dense1.biases, dense2.weights, dense2.biases]
with open("trained.nn", "wb") as h:
    pickle.dump(stream, h)

I know pickle isn't the best way, but it was easiest for me to program.

I double and triple checked that I had written the right code and I am sure I have written it correctly. I, however, doubt that because then it would be an error in the book, which aren't that common in my experience.

答案1

得分: 0

ValueError: 非可广播的输出操作数,形状为 (1,64),与广播形状 (2,64) 不匹配。

这个错误意味着NumPy无法将两个数组进行广播。广播发生在你尝试对两个不同形状的数组进行算术运算时。在这种情况下,你试图将形状为**(2, 64)的数组bias_updates与形状为(1, 64)**的数组layer.biases相加。

我建议你调查一下为什么数组bias_updates的形状是**(2, 64),如果你期望它的形状是(1, 64)**。

根据你的代码的这一部分:

bias_updates = (
        self.momentum * layer.bias_momentums
        - self.current_learning_rate * layer.dweights
    )

bias_updates的形状取决于数组layer.dweights的形状(NumPy会默默地对你的数组进行广播)。

通常,要调试这种类型的错误,你应该写下每个数组的预期形状,并在运行时检查它们是否相符。

英文:

> ValueError: non-broadcastable output operand with shape (1,64) doesn't match the broadcast shape (2,64)

This error means that numpy can't broadcast two arrays together. Broadcasting happens when you try to use arithmetic operations on two arrays of different shapes. In this case you're trying to add an array bias_updates of shape (2, 64) to an array layer.biases of shape (1, 64).

I would recommand that you investigate why the array bias_updates is of shape (2, 64) if you expect it to be of shape (1, 64).

Given this part of your code:

bias_updates = (
self.momentum * layer.bias_momentums
- self.current_learning_rate * layer.dweights
)

The shape of bias_updates comes from the shape of the array layer.dweights (numpy is silently broadcasting your arrays).

In general to debug this kind of errors you should write down the expected shapes of each of your arrays and check that they correspond at runtime.

huangapple
  • 本文由 发表于 2023年7月23日 18:42:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747803.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定