如何解决深度学习中的广播问题?

huangapple go评论75阅读模式
英文:

How to solve broadcast issue in Deep Learning?

问题

我在广播方面遇到了问题。

在第8步遇到了这个错误:

ValueError: 形状为 (1062433,1) 的非广播输出操作数与广播形状 (1062433,2) 不匹配

我有以下代码:

第1步:读取数据

df = pd.read_csv('file1.csv')

第2步:将数据拆分为训练集和测试集

train_end_date = df['Date'].max() - pd.DateOffset(years=5)
train_data = df[df['Date'] <= train_end_date]
test_data = df[df['Date'] > train_end_date]

第3步:对数据进行标准化

scaler = MinMaxScaler()
cols_to_scale = ['feature1', 'feature2']
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
test_data_scaled = scaler.transform(test_data[cols_to_scale])

第4步:准备输入序列和标签

def create_sequences(data, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i+sequence_length, :])
        y.append(data[i+sequence_length, 0])
    return np.array(X), np.array(y)

sequence_length = 10
X_train, y_train = create_sequences(train_data_scaled, sequence_length)
X_test, y_test = create_sequences(test_data_scaled, sequence_length)

第5步:构建Transformer模型

input_shape = (sequence_length, X_train.shape[2])
inputs = Input(shape=input_shape)
x = inputs
num_layers = 2
d_model = 32
num_heads = 4
dff = 64
dropout_rate = 0.1

for _ in range(num_layers):
    x = MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
    x = Dropout(dropout_rate)(x)
    x = LayerNormalization(epsilon=1e-6)(x)

x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dense(units=1)(x)
model = Model(inputs=inputs, outputs=x)

第6步:编译和训练模型

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
model.fit(X_train, y_train, epochs=1, batch_size=32)

第7步:评估模型

train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)

第8步:反向转换预测以获取实际值 - 这里是错误的地方

train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
test_predictions = scaler.inverse_transform(test_predictions.reshape(-1, 1)).flatten()

根据评论部分的要求,我尝试打印代码的最重要部分,以找出问题出在哪里。以下是结果:

第3步
train_data_scaled (1062443, 2)
test_data_scaled (308138, 2)

第4步
X_train (1062433, 10, 2)
y_train (1062433,)
X_test (308128, 10, 2)
y_test (308128,)

训练时它肯定发生了变化,但我该如何解决这个问题?

非常感谢。
如何解决深度学习中的广播问题?

英文:

I have an issue with my code, at broadcasting.

I got this error at Step 8:

ValueError: non-broadcastable output operand with shape (1062433,1) doesn&#39;t match 
the broadcast shape (1062433,2)

and I have this code:

Step 1: Read the data

df = pd.read_csv(&#39;file1.csv&#39;)

Step 2: Split the data into training and testing sets

train_end_date = df[&#39;Date&#39;].max() - pd.DateOffset(years=5)  
train_data = df[df[&#39;Date&#39;] &lt;= train_end_date]
test_data = df[df[&#39;Date&#39;] &gt; train_end_date]

Step 3: Normalize the data

scaler = MinMaxScaler()
cols_to_scale = [&#39;feature1&#39;, &#39;feature2&#39;]  
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
test_data_scaled = scaler.transform(test_data[cols_to_scale])
print(&quot;Step 3&quot;)
print(&quot;train_data_scaled&quot;, train_data_scaled.shape)
print(&quot;test_data_scaled&quot;, test_data_scaled.shape)

Step 4: Prepare the input sequences and labels

def create_sequences(data, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i+sequence_length, :])  
        y.append(data[i+sequence_length, 0])  
    return np.array(X), np.array(y)

sequence_length = 10  
X_train, y_train = create_sequences(train_data_scaled, sequence_length)
X_test, y_test = create_sequences(test_data_scaled, sequence_length)
print(&quot;Step 4&quot;)
print(&quot;X_train&quot;, X_train.shape)
print(&quot;y_train&quot;, y_train.shape)
print(&quot;X_test&quot;, X_test.shape)
print(&quot;y_test&quot;, y_test.shape)

Step 5: Build the Transformer model

input_shape = (sequence_length, X_train.shape[2])
inputs = Input(shape=input_shape)
x = inputs
num_layers = 2
d_model = 32
num_heads = 4
dff = 64
dropout_rate = 0.1

for _ in range(num_layers):
    x = MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
    x = Dropout(dropout_rate)(x)
    x = LayerNormalization(epsilon=1e-6)(x)

x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dense(units=1)(x)
model = Model(inputs=inputs, outputs=x)

Step 6: Compile and train the model

model.compile(optimizer=&#39;adam&#39;, loss=&#39;mean_squared_error&#39;, metrics=[&#39;mae&#39;])
model.fit(X_train, y_train, epochs=1, batch_size=32)

Step 7: Evaluate the model

train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
print(&quot;Step 7&quot;)
print(&quot;train_predictions at evaluation&quot;, train_predictions.shape)
print(&quot;test_predictions at evaluation&quot;, test_predictions.shape)

Step 8: Inverse transform the predictions to obtain the actual values - HERE IS THE ERROR

train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
test_predictions = scaler.inverse_transform(test_predictions.reshape(-1, 1)).flatten()
print(&quot;Step 8&quot;)
print(&quot;train_predictions after inversion&quot;, train_predictions.shape)
print(&quot;test_predictions after inversion&quot;, test_predictions.shape)

As asked in the comments' section, I tried to print the most significant parts of the code in order to find out where it cracks. Therefore, these are the results:

Step 3
train_data_scaled (1062443, 2)
test_data_scaled (308138, 2)
Step 4
X_train (1062433, 10, 2)
y_train (1062433,)
X_test (308128, 10, 2)
y_test (308128,)
33202/33202 [==============================] - 431s 13ms/step - loss: 2.1277e-05 - mae: 4.1088e-04
33202/33202 [==============================] - 203s 6ms/step
9629/9629 [==============================] - 55s 6ms/step
Step 7
train_predictions at evaluation (1062433, 1)
test_predictions at evaluation (308128, 1)

It definitely changed when training, but how could I solve this issue?

Thanks so much.
如何解决深度学习中的广播问题?

答案1

得分: 1

The root cause of the error is shape-mismatch in those tensors.

See Step 3

cols_to_scale = ['feature1', 'feature2']  
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
# train_data_scaled has the shape of (1062443, 2)

Note that the train_data has 2 features

Step 7-8

train_predictions = model.predict(X_train)
# train_predictions has the shape of (1062433, 1)
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()

Note that the output data (predictions of the model) has only one feature.

Therefore, it throws an exception

ValueError: non-broadcastable output operand with shape (1062433,1) doesn't match 
the broadcast shape (1062433,2)

because, scaler.inverse_transform() expects the input argument to have the same number of features as that of train_data_scaled, which was used to fit it.

Solution

I assume that the output feature and input features of the Neural Net model have the same meaning, because of the way you prepare them in the create_sequences().

You can trick inverse_transform() by padding the output tensor train_predictions with arbitrary values so that train_predictions have 2 features as expected by scaler.

import numpy as np
def padding_output(y):
    # padding y so that new_y.shape[1] == y.shape[1] + 1
    return np.pad(y, ((0, 0), (0, 1)), 'edge')

train_predictions = padding_output(train_predictions.reshape(-1, 1))

train_predictions = scaler.inverse_transform(train_predictions)[:,0]

Reference: numpy.pad

英文:

The root cause of the error is shape-mismatch in those tensors.

See Step 3

cols_to_scale = [&#39;feature1&#39;, &#39;feature2&#39;]  
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
# train_data_scaled has the shape of (1062443, 2)

Note that the train_data has 2 features

Step 7-8

train_predictions = model.predict(X_train)
# train_predictions has the shape of (1062433, 1)
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()

Note that the output data (predictions of the model) has only one feature.

Therefore, it throws an exception

ValueError: non-broadcastable output operand with shape (1062433,1) doesn&#39;t match 
the broadcast shape (1062433,2)

because, scaler.inverse_transform() expects the input argument to have the same number of features as that of train_data_scaled, which was used to fit it.

Solution

I assume that the output feature and input features of the Neural Net model have the same meaning, because of the way you prepare them in the create_sequences().

You can trick inverse_transform() by padding the output tensor train_predictions with arbitrary values, so that train_predictions have 2 features as expected by scaler.

import numpy as np
def padding_output(y):
    # padding y so that  new_y.shape[1] == y.shape[1] + 1
    return np.pad(y, ((0, 0), (0, 1)), &#39;edge&#39;)

train_predictions = padding_output( train_predictions.reshape(-1, 1) )

train_predictions = scaler.inverse_transform(train_predictions)[:,0]

huangapple
  • 本文由 发表于 2023年5月21日 10:33:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76298066.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定