英文:
ValueError: non-broadcastable output operand with shape (1,64) doesn't match the broadcast shape (2,64)
问题
抱歉,我只能提供有关翻译的服务,不提供代码或错误排查。如果您需要关于代码错误的帮助,请提供更多信息以便其他人能够帮助您诊断问题。
英文:
I am following the instructions to make a neural network from the book "Neural Networks from Scratch in Python" and I have gotten to the SGD optimizer with learning rate decay, but when I try to add momentum to it, I get this error:
Traceback (most recent call last):
File "nn.py", line 142, in <module>
optimizer.update_params(dense1)
File "nn.py", line 110, in update_params
layer.biases += bias_updates
ValueError: non-broadcastable output operand with shape (1,64) doesn't match the broadcast shape (2,64)
For me this seems to be an error with NumPy, but I barely have experience with it and I don't understand this error well.
Here is my code:
import numpy as np
from nnfs.datasets import spiral_data
import nnfs
import pickle
nnfs.init()
class Layer_Dense:
def __init__(self, n_inputs, n_neurons):
self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
def forward(self, inputs):
self.output = np.dot(inputs, self.weights) + self.biases
self.inputs = inputs
def backward(self, dvalues):
self.dweights = np.dot(self.inputs.T, dvalues)
self.dbiases = np.sum(dvalues, axis=0, keepdims=True)
self.dinputs = np.dot(dvalues, self.weights.T)
class Activation_ReLU:
def forward(self, inputs):
self.output = np.maximum(0, inputs)
self.inputs = inputs
def backward(self, dvalues):
self.dinputs = dvalues.copy()
self.dinputs[self.inputs <= 0] = 0
class Activation_Softmax:
def forward(self, inputs):
exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
self.output = probabilities
def backward(self, dvalues):
self.dinputs = np.empty_like(dvalues)
for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
single_output = single_output.reshape(-1, 1)
jacobian_matrix = np.diagflat(single_output) - np.dot(single_output,single_output.T)
self.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)
class Loss:
def calculate(self, output, y):
sample_losses = self.forward(output, y)
data_loss = np.mean(sample_losses)
return data_loss
class Loss_CategoricalCrossEntropy(Loss):
def forward(self, y_pred, y_true):
samples = len(y_pred)
y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)
if len(y_true.shape) == 1:
correct_confidences = y_pred_clipped[
range(samples),
y_true
]
elif len(y_true.shape) == 2:
correct_confidences = np.sum(
y_pred_clipped * y_true,
axis=1
)
negative_log_likelihoods = -np.log(correct_confidences)
return negative_log_likelihoods
def backward(self, dvalues, y_true):
samples = len(dvalues)
labels = len(dvalues[0])
if len(y_true.shape) == 1:
y_true = np.eye(labels)[y_true]
self.dinputs = -y_true / dvalues
self.dinputs = self.dinputs / samples
class Activation_Softmax_Loss_CategoricalCrossEntropy():
def __init__(self):
self.activation = Activation_Softmax()
self.loss = Loss_CategoricalCrossEntropy()
def forward(self, inputs, y_true):
self.activation.forward(inputs)
self.output = self.activation.output
return self.loss.calculate(self.output, y_true)
def backward(self, dvalues, y_true):
samples = len(dvalues)
if len(y_true.shape) == 2:
y_true = np.argmax(y_true, axis=1)
self.dinputs = dvalues.copy()
self.dinputs[range(samples), y_true] -= 1
self.dinputs = self.dinputs / samples
class Optimizer_SGD:
def __init__(self, learning_rate=1., decay=0., momentum=0.):
self.learning_rate = learning_rate
self.current_learning_rate = learning_rate
self.decay = decay
self.iterations = 0
self.momentum = momentum
def pre_update_params(self):
if self.decay:
self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
def update_params(self, layer):
if self.momentum:
if not hasattr(layer, 'weight_momentums'):
layer.weight_momentums = np.zeros_like(layer.weights)
layer.bias_momentums = np.zeros_like(layer.biases)
weight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweights
layer.weight_momentums = weight_updates
bias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dweights
layer.bias_momentums = bias_updates
else:
weight_updates = -self.current_learning_rate * layer.dweights
bias_updates = -self.current_learning_rate * layer.dbiases
layer.weights += weight_updates
layer.biases += bias_updates
def post_update_params(self):
self.iterations += 1
X, y = spiral_data(samples=100, classes=3)
optimizer = Optimizer_SGD(decay=1e-3, momentum=0.5)
dense1 = Layer_Dense(2, 64)
activation1 = Activation_ReLU()
dense2 = Layer_Dense(64, 3)
loss_activation = Activation_Softmax_Loss_CategoricalCrossEntropy()
for epoch in range(10001):
dense1.forward(X)
activation1.forward(dense1.output)
dense2.forward(activation1.output)
loss = loss_activation.forward(dense2.output, y)
predictions = np.argmax(loss_activation.output, axis=1)
if len(y.shape) == 2:
y = np.argmax(y, axis=1)
accuracy = np.mean(predictions==y)
if not epoch % 100:
print(f'epoch: {epoch}, acc: {accuracy:.3f}, loss: {loss:.3f}, lr: {optimizer.current_learning_rate}')
loss_activation.backward(loss_activation.output, y)
dense2.backward(loss_activation.dinputs)
activation1.backward(dense2.dinputs)
dense1.backward(activation1.dinputs)
optimizer.pre_update_params()
optimizer.update_params(dense1)
optimizer.update_params(dense2)
optimizer.post_update_params()
stream = [dense1.weights, dense1.biases, dense2.weights, dense2.biases]
with open("trained.nn", "wb") as h:
pickle.dump(stream, h)
I know pickle isn't the best way, but it was easiest for me to program.
I double and triple checked that I had written the right code and I am sure I have written it correctly. I, however, doubt that because then it would be an error in the book, which aren't that common in my experience.
答案1
得分: 0
ValueError: 非可广播的输出操作数,形状为 (1,64),与广播形状 (2,64) 不匹配。
这个错误意味着NumPy无法将两个数组进行广播。广播发生在你尝试对两个不同形状的数组进行算术运算时。在这种情况下,你试图将形状为**(2, 64)的数组bias_updates
与形状为(1, 64)**的数组layer.biases
相加。
我建议你调查一下为什么数组bias_updates
的形状是**(2, 64),如果你期望它的形状是(1, 64)**。
根据你的代码的这一部分:
bias_updates = (
self.momentum * layer.bias_momentums
- self.current_learning_rate * layer.dweights
)
bias_updates
的形状取决于数组layer.dweights
的形状(NumPy会默默地对你的数组进行广播)。
通常,要调试这种类型的错误,你应该写下每个数组的预期形状,并在运行时检查它们是否相符。
英文:
> ValueError: non-broadcastable output operand with shape (1,64) doesn't match the broadcast shape (2,64)
This error means that numpy can't broadcast two arrays together. Broadcasting happens when you try to use arithmetic operations on two arrays of different shapes. In this case you're trying to add an array bias_updates
of shape (2, 64) to an array layer.biases
of shape (1, 64).
I would recommand that you investigate why the array bias_updates
is of shape (2, 64) if you expect it to be of shape (1, 64).
Given this part of your code:
bias_updates = (
self.momentum * layer.bias_momentums
- self.current_learning_rate * layer.dweights
)
The shape of bias_updates
comes from the shape of the array layer.dweights
(numpy is silently broadcasting your arrays).
In general to debug this kind of errors you should write down the expected shapes of each of your arrays and check that they correspond at runtime.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论