为什么我的神经网络无法学习 XOR 问题?

huangapple go评论57阅读模式
英文:

Why does my neural network not learn the XOR problem?

问题

我是对 PyTorch 新手,尝试学习 XOR 问题(带有一些噪音)。

当然我知道我必须使用多层和非线性。但我的网络仍然不学习任何东西,所以我认为我的 PyTorch 代码中有错误。权重根本没有改变。请帮帮我!

这是我的代码:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs # 用于生成数据
from sklearn.model_selection import train_test_split
import torch

X, y = make_blobs(n_samples=200, n_features=2, cluster_std=.1, centers=[(1,1), (1,0), (0,0),(0,1)])
y[y==2]=0
y[y==3]=1

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=19)

# np->torch
x_train = torch.FloatTensor(x_train)
x_test = torch.FloatTensor(x_test)
y_train = torch.FloatTensor(y_train)
y_test = torch.FloatTensor(y_test)

class XOR(torch.nn.Module): 
    def __init__(self): 
        super(XOR, self).__init__()
        self.layer1 = torch.nn.Linear(2,2)
        self.layer2 = torch.nn.Linear(2, 1)
        self.non_linear = torch.nn.Sigmoid() 
    def forward(self, x): 
        output = self.layer1(x)
        output = self.non_linear(output)
        output = self.layer2(output)
        output = self.non_linear(output)
        return output

model = XOR()
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

model.train() # 设置为训练模式
epoch = 50
for e in range(epoch):
    # 前向传播
    y_pred = model(x_train)
    # 计算损失
    loss = criterion(y_pred.flatten(), y_train)
    optimizer.zero_grad()
    print('Epoch {}: train loss: {}'.format(e, loss.item()))
    # 反向传播
    loss.backward()
    # 进行梯度更新
    optimizer.step()

model.eval() # 设置模型为评估模式

# 训练
y_pred = model(x_train) # 预测
y_pred=(y_pred>0.5).int().flatten() # argmax 类别标签
train_acc = torch.sum(y_pred == y_train.int())/y_train.shape[0]
print("train ACC: ",train_acc.float())

# 测试
y_pred = model(x_test) # 预测
y_pred=(y_pred>0.5).int().flatten() # argmax 类别标签
test_acc = torch.sum(y_pred == y_test.int())/y_test.shape[0]
print("test ACC: ",test_acc.float())
英文:

Im new to pytorch and tried to learn the XOR Problem (with some noise).

Of course I know that i have to use multiple layers and a non-linearity in between. But my network still doesn't learn anything, so I assume there is a mistake in my pytorch code. The weights just don't change at all. Please Help!

Here is my code:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs #for data generation
from sklearn.model_selection import train_test_split
import torch
X, y = make_blobs(n_samples=200, n_features=2, cluster_std=.1
,centers= [(1,1), (1,0), (0,0),(0,1)])
y[y==2]=0
y[y==3]=1
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=19)
#np->torch
x_train = torch.FloatTensor(x_train)
x_test = torch.FloatTensor(x_test)
y_train = torch.FloatTensor(y_train)
y_test = torch.FloatTensor(y_test)
class XOR(torch.nn.Module): 
def __init__(self): 
super(XOR, self).__init__()
self.layer1 = torch.nn.Linear(2,2)
self.layer2 = torch.nn.Linear(2, 1)
self.non_linear = torch.nn.Sigmoid() 
def forward(self, x): 
output = self.layer1(x)
output = self.non_linear(output)
output = self.layer2(output)
output = self.non_linear(output)
return output
model = XOR()
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
model.train() #set to train mode
epoch = 50
for e in range(epoch):
# Forward pass
y_pred = model(x_train)
# Compute Loss
loss = criterion(y_pred.flatten(), y_train)
optimizer.zero_grad()
print('Epoch {}: train loss: {}'.format(e, loss.item()))
# Backward pass
loss.backward()
#make gradient update
optimizer.step()
model.eval() #set model to eval mode
#train 
y_pred = model(x_train) #predict
y_pred=(y_pred>0.5).int().flatten() #argmax class lable
train_acc = torch.sum(y_pred == y_train.int())/y_train.shape[0]
print("train ACC: ",train_acc.float())
#test 
y_pred = model(x_test) #predict
y_pred=(y_pred>0.5).int().flatten() #argmax class lable
test_acc = torch.sum(y_pred == y_test.int())/y_test.shape[0]
print("test ACC: ",test_acc.float())

I tried to increase lr, add more layers and add more neurons. None of it worked

答案1

得分: 1

技术上你的方法没什么问题。你只是有一个非常小的数据集(梯度估计较差),学习率很小(移动速度太慢,无法达到全局最小值),而且训练周期太少(不能接近全局最小值)。如果你改变其中任何一个,你会立即看到改进。话虽如此,如果你想更现实地解决这个问题,以下是一些观察和建议。

你的非线性选择不太好。

Sigmoid通常是应用于分类任务的输出层的非线性函数;考虑使用ReLU。对隐藏层应用Sigmoid会导致梯度消失,这在反向传播中会造成大量信息瓶颈。

你的隐藏层尺寸太小了。

你可以用隐藏层大小为2来获得比随机选择好得多的性能,但更大的值会让你更快地取得更好的效果。

考虑使用MSE并去除输出的非线性函数。

将其视为回归问题而不是分类问题会使学习过程更容易。你可以使用BCE,但会花费更长的时间,因为信息更稀疏。

你的学习率太低了。

这是一个简单的问题,具有相当整洁的损失曲面。迈出大步吧!

这是我的最终代码(实现了完美的训练和测试精度):

英文:

Technically there's nothing wrong with your approach. You just have a very tiny dataset (gradient estimates are poor) with a small learning rate (moving too slowly to global min) for too few epochs (not moving close enough to the global min). If you change any of these, you'll see immediate improvements. That said, if you want to approach this more realistically, here are some observations/suggestions.

Your non-linearity isn't a very good choice.

Sigmoid is typically a non-linearity applied to an output layer for classification tasks; consider ReLU. Sigmoid applied to hidden layers can make gradients vanish which becomes a massive information bottleneck in backprop.

Your hidden size is far too small.

You can achieve better than random performance with a hidden size of 2, but larger values will get you much further much faster.

Consider using MSE and removing the output non-linearity.

Treating this as a regression problem instead of classification will make it easier to learn. You can use BCE, but it'll take much longer because the information is more sparse.

Your learning rate is too low.

This is an easy problem with a pretty neat loss landscape. Take big steps!

Here's my final code (which achieves perfect train and test acc):

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs #for data generation
from sklearn.model_selection import train_test_split
import torch
X, y = make_blobs(n_samples=200, n_features=2, cluster_std=.1
,centers= [(1,1), (1,0), (0,0),(0,1)])
y[y==2]=0
y[y==3]=1
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=19)
#np->torch
x_train = torch.FloatTensor(x_train)
x_test = torch.FloatTensor(x_test)
y_train = torch.FloatTensor(y_train)
y_test = torch.FloatTensor(y_test)
class XOR(torch.nn.Module): 
def __init__(self): 
super(XOR, self).__init__()
self.layer1 = torch.nn.Linear(2,10)
self.layer2 = torch.nn.Linear(10, 1)
self.non_linear = torch.nn.ReLU() 
def forward(self, x): 
output = self.layer1(x)
output = self.non_linear(output)
output = self.layer2(output)
return output
model = XOR()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)
model.train() #set to train mode
epoch = 100
for e in range(epoch):
optimizer.zero_grad()
# Forward pass
y_pred = model(x_train)
# Compute Loss
loss = criterion(y_pred.squeeze(), y_train)
print('Epoch {}: train loss: {}'.format(e, loss.item()))
# Backward pass
loss.backward()
#make gradient update
optimizer.step()
model.eval() #set model to eval mode
#train 
y_pred = model(x_train) #predict
y_pred=(y_pred>0.5).int().flatten() #argmax class lable
train_acc = torch.mean((y_pred == y_train.int()).float())
print("train ACC: ",train_acc.float())
#test 
y_pred = model(x_test) #predict
y_pred=(y_pred>0.5).int().flatten() #argmax class lable
test_acc = torch.mean((y_pred == y_test.int()).float())
print("test ACC: ",test_acc.float())

答案2

得分: 0

我在SK Learn多层感知机中进行了交换,并获得了一些不同的结果:

训练准确率:0.8134328358208955
测试准确率:0.7727272727272727

训练散点图:

测试散点图:

英文:

I got some different results by swapping in the SK Learn multi-layer perceptron:

# see https://stackoverflow.com/questions/75898473/why-does-my-neural-network-not-learn-the-xor-problem?noredirect=1#comment133873645_75898473
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
if __name__ == '__main__':
X, y = make_blobs(n_samples=200, n_features=2, cluster_std=.1, centers= [(1,1), (1,0), (0,0),(0,1)])
y[y == 2] = 0
y[y == 3] = 1
plt.scatter(X[:, 0], X[:, 1], c=y, s=25, edgecolors='k')
plt.show()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=19)
model = MLPClassifier(hidden_layer_sizes=20, learning_rate='adaptive', epsilon=0.01)
model.fit(X_train, y_train)
#train
y_pred_train = model.predict(X_train)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, s=25, edgecolors='k')
plt.show()
train_acc = np.sum(y_pred_train == y_train)/y_train.shape[0]
print("train accuracy: ", train_acc)
#test
y_pred_test = model.predict(X_test)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, s=25, edgecolors='k')
plt.show()
test_acc = np.sum(y_pred_test == y_test)/y_test.shape[0]
print("test  accuracy: ", test_acc)

Here are the results:

train accuracy:  0.8134328358208955
test  accuracy:  0.7727272727272727

Scatter plot for train:

为什么我的神经网络无法学习 XOR 问题?

Scatter plot for test:

为什么我的神经网络无法学习 XOR 问题?

Could play around the hidden layers and epsilon to see what effect they have on accuracy.

I can make both train and test accuracies equal to 1.0 if I take the default values for MLPClassifier, but I'm sure it's just overfitting and memorizing the data.

huangapple
  • 本文由 发表于 2023年3月31日 20:22:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75898473.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定