2023年5月21日 10:33:18go评论108阅读模式

英文:

How to solve broadcast issue in Deep Learning?

问题

我在广播方面遇到了问题。

在第8步遇到了这个错误：

ValueError: 形状为 (1062433,1) 的非广播输出操作数与广播形状 (1062433,2) 不匹配

我有以下代码：

第1步：读取数据

df = pd.read_csv('file1.csv')

第2步：将数据拆分为训练集和测试集

train_end_date = df['Date'].max() - pd.DateOffset(years=5)
train_data = df[df['Date'] <= train_end_date]
test_data = df[df['Date'] > train_end_date]

第3步：对数据进行标准化

scaler = MinMaxScaler()
cols_to_scale = ['feature1', 'feature2']
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
test_data_scaled = scaler.transform(test_data[cols_to_scale])

第4步：准备输入序列和标签

def create_sequences(data, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i+sequence_length, :])
        y.append(data[i+sequence_length, 0])
    return np.array(X), np.array(y)
sequence_length = 10
X_train, y_train = create_sequences(train_data_scaled, sequence_length)
X_test, y_test = create_sequences(test_data_scaled, sequence_length)

第5步：构建Transformer模型

input_shape = (sequence_length, X_train.shape[2])
inputs = Input(shape=input_shape)
x = inputs
num_layers = 2
d_model = 32
num_heads = 4
dff = 64
dropout_rate = 0.1
for _ in range(num_layers):
    x = MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
    x = Dropout(dropout_rate)(x)
    x = LayerNormalization(epsilon=1e-6)(x)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dense(units=1)(x)
model = Model(inputs=inputs, outputs=x)

第6步：编译和训练模型

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
model.fit(X_train, y_train, epochs=1, batch_size=32)

第7步：评估模型

train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)

第8步：反向转换预测以获取实际值 - 这里是错误的地方

train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
test_predictions = scaler.inverse_transform(test_predictions.reshape(-1, 1)).flatten()

根据评论部分的要求，我尝试打印代码的最重要部分，以找出问题出在哪里。以下是结果：

第3步
train_data_scaled (1062443, 2)
test_data_scaled (308138, 2)

第4步
X_train (1062433, 10, 2)
y_train (1062433,)
X_test (308128, 10, 2)
y_test (308128,)

训练时它肯定发生了变化，但我该如何解决这个问题？

非常感谢。

英文:

I have an issue with my code, at broadcasting.

I got this error at Step 8:

ValueError: non-broadcastable output operand with shape (1062433,1) doesn&#39;t match 
the broadcast shape (1062433,2)

and I have this code:

Step 1: Read the data

df = pd.read_csv(&#39;file1.csv&#39;)

Step 2: Split the data into training and testing sets

train_end_date = df[&#39;Date&#39;].max() - pd.DateOffset(years=5)  
train_data = df[df[&#39;Date&#39;] &lt;= train_end_date]
test_data = df[df[&#39;Date&#39;] &gt; train_end_date]

Step 3: Normalize the data

scaler = MinMaxScaler()
cols_to_scale = [&#39;feature1&#39;, &#39;feature2&#39;]  
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
test_data_scaled = scaler.transform(test_data[cols_to_scale])
print(&quot;Step 3&quot;)
print(&quot;train_data_scaled&quot;, train_data_scaled.shape)
print(&quot;test_data_scaled&quot;, test_data_scaled.shape)

Step 4: Prepare the input sequences and labels

def create_sequences(data, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i+sequence_length, :])  
        y.append(data[i+sequence_length, 0])  
    return np.array(X), np.array(y)
sequence_length = 10  
X_train, y_train = create_sequences(train_data_scaled, sequence_length)
X_test, y_test = create_sequences(test_data_scaled, sequence_length)
print(&quot;Step 4&quot;)
print(&quot;X_train&quot;, X_train.shape)
print(&quot;y_train&quot;, y_train.shape)
print(&quot;X_test&quot;, X_test.shape)
print(&quot;y_test&quot;, y_test.shape)

Step 5: Build the Transformer model

input_shape = (sequence_length, X_train.shape[2])
inputs = Input(shape=input_shape)
x = inputs
num_layers = 2
d_model = 32
num_heads = 4
dff = 64
dropout_rate = 0.1
for _ in range(num_layers):
    x = MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
    x = Dropout(dropout_rate)(x)
    x = LayerNormalization(epsilon=1e-6)(x)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = Dense(units=1)(x)
model = Model(inputs=inputs, outputs=x)

Step 6: Compile and train the model

model.compile(optimizer=&#39;adam&#39;, loss=&#39;mean_squared_error&#39;, metrics=[&#39;mae&#39;])
model.fit(X_train, y_train, epochs=1, batch_size=32)

Step 7: Evaluate the model

train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
print(&quot;Step 7&quot;)
print(&quot;train_predictions at evaluation&quot;, train_predictions.shape)
print(&quot;test_predictions at evaluation&quot;, test_predictions.shape)

Step 8: Inverse transform the predictions to obtain the actual values - HERE IS THE ERROR

train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()
test_predictions = scaler.inverse_transform(test_predictions.reshape(-1, 1)).flatten()
print(&quot;Step 8&quot;)
print(&quot;train_predictions after inversion&quot;, train_predictions.shape)
print(&quot;test_predictions after inversion&quot;, test_predictions.shape)

As asked in the comments' section, I tried to print the most significant parts of the code in order to find out where it cracks. Therefore, these are the results:

Step 3
train_data_scaled (1062443, 2)
test_data_scaled (308138, 2)
Step 4
X_train (1062433, 10, 2)
y_train (1062433,)
X_test (308128, 10, 2)
y_test (308128,)
33202/33202 [==============================] - 431s 13ms/step - loss: 2.1277e-05 - mae: 4.1088e-04
33202/33202 [==============================] - 203s 6ms/step
9629/9629 [==============================] - 55s 6ms/step
Step 7
train_predictions at evaluation (1062433, 1)
test_predictions at evaluation (308128, 1)

It definitely changed when training, but how could I solve this issue?

Thanks so much.

答案1

得分: 1

The root cause of the error is shape-mismatch in those tensors.

See Step 3

cols_to_scale = ['feature1', 'feature2']  
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
# train_data_scaled has the shape of (1062443, 2)

Note that the train_data has 2 features

Step 7-8

train_predictions = model.predict(X_train)
# train_predictions has the shape of (1062433, 1)
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()

Note that the output data (predictions of the model) has only one feature.

Therefore, it throws an exception

ValueError: non-broadcastable output operand with shape (1062433,1) doesn't match 
the broadcast shape (1062433,2)

because, scaler.inverse_transform() expects the input argument to have the same number of features as that of train_data_scaled, which was used to fit it.

Solution

I assume that the output feature and input features of the Neural Net model have the same meaning, because of the way you prepare them in the create_sequences().

You can trick inverse_transform() by padding the output tensor train_predictions with arbitrary values so that train_predictions have 2 features as expected by scaler.

import numpy as np
def padding_output(y):
    # padding y so that new_y.shape[1] == y.shape[1] + 1
    return np.pad(y, ((0, 0), (0, 1)), 'edge')
train_predictions = padding_output(train_predictions.reshape(-1, 1))
train_predictions = scaler.inverse_transform(train_predictions)[:,0]

Reference: numpy.pad

英文:

The root cause of the error is shape-mismatch in those tensors.

See Step 3

cols_to_scale = [&#39;feature1&#39;, &#39;feature2&#39;]  
train_data_scaled = scaler.fit_transform(train_data[cols_to_scale])
# train_data_scaled has the shape of (1062443, 2)

Note that the train_data has 2 features

Step 7-8

train_predictions = model.predict(X_train)
# train_predictions has the shape of (1062433, 1)
train_predictions = scaler.inverse_transform(train_predictions.reshape(-1, 1)).flatten()

Note that the output data (predictions of the model) has only one feature.

Therefore, it throws an exception

ValueError: non-broadcastable output operand with shape (1062433,1) doesn&#39;t match 
the broadcast shape (1062433,2)

because, scaler.inverse_transform() expects the input argument to have the same number of features as that of train_data_scaled, which was used to fit it.

Solution

I assume that the output feature and input features of the Neural Net model have the same meaning, because of the way you prepare them in the create_sequences().

You can trick inverse_transform() by padding the output tensor train_predictions with arbitrary values, so that train_predictions have 2 features as expected by scaler.

import numpy as np
def padding_output(y):
    # padding y so that  new_y.shape[1] == y.shape[1] + 1
    return np.pad(y, ((0, 0), (0, 1)), &#39;edge&#39;)
train_predictions = padding_output( train_predictions.reshape(-1, 1) )
train_predictions = scaler.inverse_transform(train_predictions)[:,0]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何解决深度学习中的广播问题？

问题

答案1

Solution

问题安装来自GitHub的Python程序

IndexError: index 4 is out of bounds for dimension 0 with size 4.

使用共享的Dockerfile用于多个Dockerfile。

Python: 如何使用不区分大小写的匹配从一组字符串中删除/丢弃一个字符串？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论