如何使用SKLearn按列规范化Python数组中的数据?

huangapple go评论63阅读模式
英文:

How to normalize data in a python array by column using SKLearn?

问题

我正在使用Keras编写一个机器学习算法,并需要在将数据输入之前对其进行归一化。我有3个输入组织成一个2D数组,其中每列都代表一个输入。

import tensorflow as tf
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#导入所有必需的模块

raw_data = np.array([]) #为训练数据定义NumPy数组
val_data = np.array([]) #为验证数据定义NumPy数组
test = np.array([]) #为测试数据定义NumPy数组
rawfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt'
valfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt'
testfilepath = r'C:\Users\***\Desktop\***\h4t6usedforprediction.txt' #文件路径
raw_data = np.loadtxt(rawfilepath)
val_data = np.loadtxt(valfilepath)
test = np.loadtxt(testfilepath) #将文本文件的内容加载到它们各自的数组中
X = raw_data[:, 1:4] #分割数据,X包含坐标位置、初始剪切和初始值
Y = raw_data[:, 0] #分割数据,Y包含测量的高度
X_Val = val_data[:, 1:4]
Y_Val = val_data[:, 0]
X_test = test[:, 1:4]
Y_test = test[:, 0]
scalar = MinMaxScaler()
#print(X_test)
#print(Y_test)
print(X)
print(Y)

scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X) 
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.fit_transform(X_Val)
Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.fit_transform(X_test)
Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))

Y变量正常化得很好,但我认为X变量正常化时使用了整个数组而不是逐列进行的。

这些是模型用于进行预测的输入。

X=[0.94941569 0.         0.        ], Predicted=[0.02409407]
X=[0.95664225 0.         0.        ], Predicted=[0.02374389]
X=[0.93496738 0.         0.        ], Predicted=[0.02480936]
X=[0.94219233 0.         0.        ], Predicted=[0.02444912]
X=[0.92774402 0.         0.        ], Predicted=[0.02517468]
X=[0.92052067 0.         0.        ], Predicted=[0.02554525]
X=[0.91329892 0.         0.        ], Predicted=[0.02592104]
X=[0.90607877 0.         0.        ], Predicted=[0.02630214]
X=[0.89885863 0.         0.        ], Predicted=[0.02668868]
X=[0.89163848 0.         0.        ], Predicted=[0.02708073]
X=[0.88441994 0.         0.        ], Predicted=[0.0274783]
X=[0.87720299 0.         0.        ], Predicted=[0.02788144]
英文:

I am coding a machine learning algorithm using Keras and I need to normalize my data before feeding it through. I have 3 inputs organised into a 2d array with each column making up an input.

    import tensorflow as tf
    import keras
    import numpy as np
    from keras.models import Sequential
    from keras.layers import Dense, Activation, Dropout
    import matplotlib.pyplot as plt
    from sklearn.preprocessing import MinMaxScaler
    #Importing all the required modules

    raw_data = np.array([]) #Defining numpy array for training data
    val_data = np.array([]) #Defining numpy array for validation data
    test = np.array([]) #Defining numpy array for test data
    rawfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt'
    valfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt'
    testfilepath = r'C:\Users\***\Desktop\***\h4t6usedforprediction.txt' #Filepaths 
    raw_data = np.loadtxt(rawfilepath)
    val_data = np.loadtxt(valfilepath)
    test = np.loadtxt(testfilepath) #Loading contents of text files into their respective arrays
    X = raw_data[:, 1:4] #Splitting the data, X contains the coordinate position, initial shear and initial  
    Y = raw_data[:, 0] #Splitting the data, Y contains the measured height
    X_Val = val_data[:, 1:4]
    Y_Val = val_data[:, 0]
    X_test = test[:, 1:4]
    Y_test = test[:, 0]
    scalar = MinMaxScaler()
    #print(X_test)
    #print(Y_test)
    print(X)
    print(Y)

    scaler = MinMaxScaler()
    Xnorm = scaler.fit_transform(X) 
    Ynorm = scaler.fit_transform(Y.reshape(-1,1))
    Xvalnorm = scaler.fit_transform(X_Val)
    Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
    Xtestnorm = scaler.fit_transform(X_test)
    Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))

The Y variables are normalising fine however I think the X variables are normalising with the whole array rather than column by column.

These are the inputs that the model is using to make predictions.

X=[0.94941569 0.         0.        ], Predicted=[0.02409407]
X=[0.95664225 0.         0.        ], Predicted=[0.02374389]
X=[0.93496738 0.         0.        ], Predicted=[0.02480936]
X=[0.94219233 0.         0.        ], Predicted=[0.02444912]
X=[0.92774402 0.         0.        ], Predicted=[0.02517468]
X=[0.92052067 0.         0.        ], Predicted=[0.02554525]
X=[0.91329892 0.         0.        ], Predicted=[0.02592104]
X=[0.90607877 0.         0.        ], Predicted=[0.02630214]
X=[0.89885863 0.         0.        ], Predicted=[0.02668868]
X=[0.89163848 0.         0.        ], Predicted=[0.02708073]
X=[0.88441994 0.         0.        ], Predicted=[0.0274783]
X=[0.87720299 0.         0.        ], Predicted=[0.02788144]

答案1

得分: 0

1 - 如果XY是你的训练集,对该集合调用fit_transform是正确的。但是你不能再次对你的验证测试集合进行fit_transform操作。你只需使用之前定义的scaler来对它们进行transform操作:

scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X) 
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))

2 - 我假设你在最后发布的X的值已经经过归一化。因此,我创建了my_X只是为了演示如何使用sklearn来归一化一些数据:

my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)

只需将my_X中的值更改为你在X中拥有的值即可。

英文:

Let's do this by part:

1 - If Xand Y are you train set, calling fit_transform in that set is correct. But you can not fit_transform your validationand test sets again. You have to just transform them using the scaleryou have previously defined:

scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X) 
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))

2 - I am assuming the values of X you have posted at the end are already what you got from the normalization. So, i have created my_X just to exemplify to use sklearn to normalize some data:

my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31]])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)

Just change the values my_X for the values you have in your X.

huangapple
  • 本文由 发表于 2023年2月16日 03:07:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464414.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定