英文:
How to normalize data in a python array by column using SKLearn?
问题
我正在使用Keras编写一个机器学习算法,并需要在将数据输入之前对其进行归一化。我有3个输入组织成一个2D数组,其中每列都代表一个输入。
import tensorflow as tf
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#导入所有必需的模块
raw_data = np.array([]) #为训练数据定义NumPy数组
val_data = np.array([]) #为验证数据定义NumPy数组
test = np.array([]) #为测试数据定义NumPy数组
rawfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt'
valfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt'
testfilepath = r'C:\Users\***\Desktop\***\h4t6usedforprediction.txt' #文件路径
raw_data = np.loadtxt(rawfilepath)
val_data = np.loadtxt(valfilepath)
test = np.loadtxt(testfilepath) #将文本文件的内容加载到它们各自的数组中
X = raw_data[:, 1:4] #分割数据,X包含坐标位置、初始剪切和初始值
Y = raw_data[:, 0] #分割数据,Y包含测量的高度
X_Val = val_data[:, 1:4]
Y_Val = val_data[:, 0]
X_test = test[:, 1:4]
Y_test = test[:, 0]
scalar = MinMaxScaler()
#print(X_test)
#print(Y_test)
print(X)
print(Y)
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X)
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.fit_transform(X_Val)
Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.fit_transform(X_test)
Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))
Y变量正常化得很好,但我认为X变量正常化时使用了整个数组而不是逐列进行的。
这些是模型用于进行预测的输入。
X=[0.94941569 0. 0. ], Predicted=[0.02409407]
X=[0.95664225 0. 0. ], Predicted=[0.02374389]
X=[0.93496738 0. 0. ], Predicted=[0.02480936]
X=[0.94219233 0. 0. ], Predicted=[0.02444912]
X=[0.92774402 0. 0. ], Predicted=[0.02517468]
X=[0.92052067 0. 0. ], Predicted=[0.02554525]
X=[0.91329892 0. 0. ], Predicted=[0.02592104]
X=[0.90607877 0. 0. ], Predicted=[0.02630214]
X=[0.89885863 0. 0. ], Predicted=[0.02668868]
X=[0.89163848 0. 0. ], Predicted=[0.02708073]
X=[0.88441994 0. 0. ], Predicted=[0.0274783]
X=[0.87720299 0. 0. ], Predicted=[0.02788144]
英文:
I am coding a machine learning algorithm using Keras and I need to normalize my data before feeding it through. I have 3 inputs organised into a 2d array with each column making up an input.
import tensorflow as tf
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#Importing all the required modules
raw_data = np.array([]) #Defining numpy array for training data
val_data = np.array([]) #Defining numpy array for validation data
test = np.array([]) #Defining numpy array for test data
rawfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt'
valfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt'
testfilepath = r'C:\Users\***\Desktop\***\h4t6usedforprediction.txt' #Filepaths
raw_data = np.loadtxt(rawfilepath)
val_data = np.loadtxt(valfilepath)
test = np.loadtxt(testfilepath) #Loading contents of text files into their respective arrays
X = raw_data[:, 1:4] #Splitting the data, X contains the coordinate position, initial shear and initial
Y = raw_data[:, 0] #Splitting the data, Y contains the measured height
X_Val = val_data[:, 1:4]
Y_Val = val_data[:, 0]
X_test = test[:, 1:4]
Y_test = test[:, 0]
scalar = MinMaxScaler()
#print(X_test)
#print(Y_test)
print(X)
print(Y)
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X)
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.fit_transform(X_Val)
Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.fit_transform(X_test)
Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))
The Y variables are normalising fine however I think the X variables are normalising with the whole array rather than column by column.
These are the inputs that the model is using to make predictions.
X=[0.94941569 0. 0. ], Predicted=[0.02409407]
X=[0.95664225 0. 0. ], Predicted=[0.02374389]
X=[0.93496738 0. 0. ], Predicted=[0.02480936]
X=[0.94219233 0. 0. ], Predicted=[0.02444912]
X=[0.92774402 0. 0. ], Predicted=[0.02517468]
X=[0.92052067 0. 0. ], Predicted=[0.02554525]
X=[0.91329892 0. 0. ], Predicted=[0.02592104]
X=[0.90607877 0. 0. ], Predicted=[0.02630214]
X=[0.89885863 0. 0. ], Predicted=[0.02668868]
X=[0.89163848 0. 0. ], Predicted=[0.02708073]
X=[0.88441994 0. 0. ], Predicted=[0.0274783]
X=[0.87720299 0. 0. ], Predicted=[0.02788144]
答案1
得分: 0
1 - 如果X
和Y
是你的训练
集,对该集合调用fit_transform
是正确的。但是你不能再次对你的验证
和测试
集合进行fit_transform
操作。你只需使用之前定义的scaler
来对它们进行transform
操作:
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X)
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))
2 - 我假设你在最后发布的X
的值已经经过归一化。因此,我创建了my_X
只是为了演示如何使用sklearn
来归一化一些数据:
my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)
只需将my_X
中的值更改为你在X
中拥有的值即可。
英文:
Let's do this by part:
1 - If X
and Y
are you train
set, calling fit_transform
in that set is correct. But you can not fit_transform
your validation
and test
sets again. You have to just transform
them using the scaler
you have previously defined:
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X)
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))
2 - I am assuming the values of X
you have posted at the end are already what you got from the normalization. So, i have created my_X
just to exemplify to use sklearn
to normalize some data:
my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31]])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)
Just change the values my_X
for the values you have in your X
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论