无法在神经网络中捕捉模式

huangapple go评论72阅读模式
英文:

Unable to capture pattern in a Neural network

问题

I am trying to train a neural network for learning purposes using tesorflow.keras.
神经网络用于学习目的,使用了 tesorflow.keras

the network should take in a row vector of size 100.
网络应接收大小为100的行向量。

the row vector values are all 0 except only one element having value 1, the neural network should return the position index of element with 1.
行向量的值都为0,除了一个元素为1,神经网络应返回值为1的元素的位置索引。

if input row vector is [0,0,0,0,1,0...,0,0] then the output should be 5 (starting index is 1).
如果输入的行向量是 [0,0,0,0,1,0...,0,0],那么输出应为5(起始索引为1)。

I have artificially created a training set that contains row vectors such that positions of 1 are within the range 30 to 70.
我人工创建了一个训练集,其中包含行向量,使得1的位置位于30到70的范围内。

similarly a test/validation set also has been created for the range 10 to 90.
类似地,还为范围在10到90之间创建了测试/验证集。

the problem faced is that the model fits only over the training set and is unable to recognize the actual pattern thus having high plateauing validation losses.
面临的问题是模型只适用于训练集,无法识别实际模式,因此验证损失高且趋于平稳。

theoretically the optimum solution should be a single input layer (of size 100, taking in all row vector values) and a single neuron layer with the weights/kernels and bias as [1,2,3,4....100],[0].
从理论上讲,最佳解决方案应该是一个单输入层(大小为100,接受所有行向量值)和一个具有权重/核和偏置的单神经元层,权重/核为[1,2,3,4....100],[0]。

So the question is why is the descent not going towards the optimal weights & bias? and how to overcome it?
所以问题是为什么下降不会朝着最佳权重和偏置的方向进行?如何克服这个问题?

what i have tried
我尝试过的方法

tried to increase the neurons, layers and decrease the learning rate
尝试增加神经元、层数,并降低学习率

my essential code segment is given below
我的关键代码段如下所示

英文:

I am trying to train a neural network for learning purposes using tesorflow.keras
the network should take in a row vector of size 100.
the row vector values are all 0 except only one element having value 1,the neural network should return the position index of element with 1.

example:

if input row vector is [0,0,0,0,1,0...,0,0] then the output should be 5 (starting index is 1).

I have artificially created a training set that contains row vectors such that positions of 1 are within the range 30 to 70.
similarly a test/validation set also has been created for the range 10 to 90.

the problem faced is that the model fits only over the training set and is unable to recognize the actual pattern thus having high plateauing validation losses.

theoretically the optimum solution should be a single input layer (of size 100,taking in all row vector values) and a single neuron layer with the weights/kernels and bias as [1,2,3,4....100],[0].
So the question is why is the decent not going towards the optimal weights & bias? and how to overcome it?

what i have tried

tried to increase the neurons,layers and decrease the learning rate

my essential code segment is given below

from tensorflow.keras import layers,models,optimizers
from tensorflow.keras.models import load_model
import numpy as np

#creating training set
blank=np.zeros((100,1),dtype='uint8')
x_train,y_train=[],[]
for k in range(0,1):
    for i in range(30,71):
        img=blank.copy()
        img[i]=1
        x_train.append(img)
        y_train.append(i+1)
x_train,y_train=np.array(x_train),np.array(y_train)

#creating test set
x_test,y_test=[],[]
for i in range(10,91):
    img=blank.copy()
    img[i]=1
    x_test.append(img)
    y_test.append(i+1)
x_test,y_test=np.array(x_test),np.array(y_test)

#defining and fitting the neural network
ann = models.Sequential([
    layers.Flatten(input_shape=(100,)),
    layers.Dense(units=1)])
ann.compile(optimizer=optimizers.Adam(learning_rate=0.1,beta_1=0.9,beta_2=0.999),loss='mean_squared_error')
history=ann.fit(x_train, y_train,epochs=25,batch_size=4,validation_data=(x_test,y_test),verbose=0)

#printing the final outcomes
train_loss,train_error=ann.evaluate(x_train, y_train,verbose=0)
print(f'train set loss {train_loss:.2f} error {train_error:.2f}')

test_loss,test_error=ann.evaluate(x_test, y_test,verbose=0)
print(f'test set loss {test_loss:.2f} error {test_error:.2f}')

Graphs for references

learning curve & truth model plot

答案1

得分: 1

你的行向量大小为100,输入到一个只有1个神经元和1层的神经网络中,没有激活函数。
这相当于一个简单直接的线性表达式。

y=w1x1+w2x2+23x3...+w100x100+b

如果你注意到,我们有100个w变量,1个b变量,但只有41个训练数据,也可以看作是41个方程。

从数学上讲,因为我们有101个变量和41个方程,对于这些变量来说有无数个解,这也是你的神经网络所做的。

基于初始随机权重和偏置值,你的神经网络会找到最接近的解,适用于你的41个训练数据。

总的来说,你不能用传统的下降方法达到所谓的最优解。

英文:

your row vector is of size 100 which are inputted into a neural network of 1 neuron in 1 layer and no activation.
this is equivalent to a simple and direct linear expression.

y=w1*x1+w2*x2+23*x3...+w100*x100+b

If you notice we have 100 w variables,1 b variable and only 41 training data which can also taken as 41 equations.

Mathematically since we have 101 variables and 41 equations there are infinite solutions to the variable and that is what your neural network is also giving doing.

Based on the initial random values of weights and biases your neural network latches on the closest solution which holds good for your 41 training data.

In conclusion you cannot reach your so called optimum solution with conventional descents

答案2

得分: 0

模型应该是 Dense(100)(或者其他类型),因为你正在要求你的神经网络从1到100中选择最合适的索引。你的模型现在的结果是 activation(sum(x[i] * weight[i]))。考虑到数据集中除了一个之外的每个 x 都是0,你最终得到 activation(weight[i]),这主要是由于激活函数的特性,其值在 [-1;1] 之间。

英文:

Model should be Dense(100) (or whatever type) with categorical error as you are asking your NN of most suitable index from 1 to 100. Your model now results in activation(sum(x[i] * weight[i])). Considering that there are every x except one is 0 in dataset, you end up with activation(weight[i]) which is mostly (due to activation function specifics) between [-1;1].

huangapple
  • 本文由 发表于 2023年5月8日 01:13:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76195283.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定