英文:
How to solve broadcast issue in Deep Learning?
问题
我在广播方面遇到了问题。
在第8步遇到了这个错误:
ValueError: 形状为 (1062433,1) 的非广播输出操作数与广播形状 (1062433,2) 不匹配
我有以下代码:
第1步:读取数据
df = pd.read_csv('file1.csv')
第2步:将数据拆分为训练集和测试集
train_end_date = df['Date'].max() - pd.DateOffset(years=5)
train_data = df[df['Date'] <= train_end_date]
test_data = df[df['Date'] > train_end_date]
第3步:对数据进行标准化
scaler = MinMaxScaler()
cols_to_scale = ['feature1', 'feature2']
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
test_data_scaled = scaler.transform(test_data[cols_to_scale])
第4步:准备输入序列和标签
def create_sequences(data, sequence_length):
X, y = [], []
for i in range(len(data) - sequence_length):
X.append(data[i:i+sequence_length, :])
y.append(data[i+sequence_length, 0])
return np.array(X), np.array(y)
sequence_length = 10
X_train, y_train = create_sequences(train_data_scaled, sequence_length)
X_test, y_test = create_sequences(test_data_scaled, sequence_length)
第5步:构建Transformer模型
input_shape = (sequence_length, X_train.shape[2])
inputs = Input(shape=input_shape)
x = inputs
num_layers = 2
d_model = 32
num_heads = 4
dff = 64
dropout_rate = 0.1
for _ in range(num_layers):
x = MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
x = Dropout(dropout_rate)(x)
x = LayerNormalization(epsilon=1e-6)(x)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dense(units=1)(x)
model = Model(inputs=inputs, outputs=x)
第6步:编译和训练模型
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
model.fit(X_train, y_train, epochs=1, batch_size=32)
第7步:评估模型
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
第8步:反向转换预测以获取实际值 - 这里是错误的地方
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
test_predictions = scaler.inverse_transform(test_predictions.reshape(-1, 1)).flatten()
根据评论部分的要求,我尝试打印代码的最重要部分,以找出问题出在哪里。以下是结果:
第3步
train_data_scaled (1062443, 2)
test_data_scaled (308138, 2)
第4步
X_train (1062433, 10, 2)
y_train (1062433,)
X_test (308128, 10, 2)
y_test (308128,)
训练时它肯定发生了变化,但我该如何解决这个问题?
非常感谢。
英文:
I have an issue with my code, at broadcasting.
I got this error at Step 8:
ValueError: non-broadcastable output operand with shape (1062433,1) doesn't match
the broadcast shape (1062433,2)
and I have this code:
Step 1: Read the data
df = pd.read_csv('file1.csv')
Step 2: Split the data into training and testing sets
train_end_date = df['Date'].max() - pd.DateOffset(years=5)
train_data = df[df['Date'] <= train_end_date]
test_data = df[df['Date'] > train_end_date]
Step 3: Normalize the data
scaler = MinMaxScaler()
cols_to_scale = ['feature1', 'feature2']
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
test_data_scaled = scaler.transform(test_data[cols_to_scale])
print("Step 3")
print("train_data_scaled", train_data_scaled.shape)
print("test_data_scaled", test_data_scaled.shape)
Step 4: Prepare the input sequences and labels
def create_sequences(data, sequence_length):
X, y = [], []
for i in range(len(data) - sequence_length):
X.append(data[i:i+sequence_length, :])
y.append(data[i+sequence_length, 0])
return np.array(X), np.array(y)
sequence_length = 10
X_train, y_train = create_sequences(train_data_scaled, sequence_length)
X_test, y_test = create_sequences(test_data_scaled, sequence_length)
print("Step 4")
print("X_train", X_train.shape)
print("y_train", y_train.shape)
print("X_test", X_test.shape)
print("y_test", y_test.shape)
Step 5: Build the Transformer model
input_shape = (sequence_length, X_train.shape[2])
inputs = Input(shape=input_shape)
x = inputs
num_layers = 2
d_model = 32
num_heads = 4
dff = 64
dropout_rate = 0.1
for _ in range(num_layers):
x = MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
x = Dropout(dropout_rate)(x)
x = LayerNormalization(epsilon=1e-6)(x)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dense(units=1)(x)
model = Model(inputs=inputs, outputs=x)
Step 6: Compile and train the model
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
model.fit(X_train, y_train, epochs=1, batch_size=32)
Step 7: Evaluate the model
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
print("Step 7")
print("train_predictions at evaluation", train_predictions.shape)
print("test_predictions at evaluation", test_predictions.shape)
Step 8: Inverse transform the predictions to obtain the actual values - HERE IS THE ERROR
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
test_predictions = scaler.inverse_transform(test_predictions.reshape(-1, 1)).flatten()
print("Step 8")
print("train_predictions after inversion", train_predictions.shape)
print("test_predictions after inversion", test_predictions.shape)
As asked in the comments' section, I tried to print the most significant parts of the code in order to find out where it cracks. Therefore, these are the results:
Step 3
train_data_scaled (1062443, 2)
test_data_scaled (308138, 2)
Step 4
X_train (1062433, 10, 2)
y_train (1062433,)
X_test (308128, 10, 2)
y_test (308128,)
33202/33202 [==============================] - 431s 13ms/step - loss: 2.1277e-05 - mae: 4.1088e-04
33202/33202 [==============================] - 203s 6ms/step
9629/9629 [==============================] - 55s 6ms/step
Step 7
train_predictions at evaluation (1062433, 1)
test_predictions at evaluation (308128, 1)
It definitely changed when training, but how could I solve this issue?
Thanks so much.
答案1
得分: 1
The root cause of the error is shape-mismatch in those tensors.
See Step 3
cols_to_scale = ['feature1', 'feature2']
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
# train_data_scaled has the shape of (1062443, 2)
Note that the train_data has 2 features
Step 7-8
train_predictions = model.predict(X_train)
# train_predictions has the shape of (1062433, 1)
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
Note that the output data (predictions of the model
) has only one feature.
Therefore, it throws an exception
ValueError: non-broadcastable output operand with shape (1062433,1) doesn't match
the broadcast shape (1062433,2)
because, scaler.inverse_transform()
expects the input argument to have the same number of features as that of train_data_scaled
, which was used to fit it.
Solution
I assume that the output feature and input features of the Neural Net model
have the same meaning, because of the way you prepare them in the create_sequences()
.
You can trick inverse_transform()
by padding the output tensor train_predictions
with arbitrary values so that train_predictions have 2 features as expected by scaler
.
import numpy as np
def padding_output(y):
# padding y so that new_y.shape[1] == y.shape[1] + 1
return np.pad(y, ((0, 0), (0, 1)), 'edge')
train_predictions = padding_output(train_predictions.reshape(-1, 1))
train_predictions = scaler.inverse_transform(train_predictions)[:,0]
英文:
The root cause of the error is shape-mismatch in those tensors.
See Step 3
cols_to_scale = ['feature1', 'feature2']
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
# train_data_scaled has the shape of (1062443, 2)
Note that the train_data has 2 features
Step 7-8
train_predictions = model.predict(X_train)
# train_predictions has the shape of (1062433, 1)
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
Note that the output data (predictions of the model
) has only one feature.
Therefore, it throws an exception
ValueError: non-broadcastable output operand with shape (1062433,1) doesn't match
the broadcast shape (1062433,2)
because, scaler.inverse_transform()
expects the input argument to have the same number of features as that of train_data_scaled
, which was used to fit it.
Solution
I assume that the output feature and input features of the Neural Net model
have the same meaning, because of the way you prepare them in the create_sequences()
.
You can trick inverse_transform()
by padding the output tensor train_predictions
with arbitrary values, so that train_predictions have 2 features as expected by scaler
.
import numpy as np
def padding_output(y):
# padding y so that new_y.shape[1] == y.shape[1] + 1
return np.pad(y, ((0, 0), (0, 1)), 'edge')
train_predictions = padding_output( train_predictions.reshape(-1, 1) )
train_predictions = scaler.inverse_transform(train_predictions)[:,0]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论