2023年2月16日 03:07:15go评论100阅读模式

英文:

How to normalize data in a python array by column using SKLearn?

问题

我正在使用Keras编写一个机器学习算法，并需要在将数据输入之前对其进行归一化。我有3个输入组织成一个2D数组，其中每列都代表一个输入。

import tensorflow as tf
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#导入所有必需的模块
raw_data = np.array([]) #为训练数据定义NumPy数组
val_data = np.array([]) #为验证数据定义NumPy数组
test = np.array([]) #为测试数据定义NumPy数组
rawfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt'
valfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt'
testfilepath = r'C:\Users\***\Desktop\***\h4t6usedforprediction.txt' #文件路径
raw_data = np.loadtxt(rawfilepath)
val_data = np.loadtxt(valfilepath)
test = np.loadtxt(testfilepath) #将文本文件的内容加载到它们各自的数组中
X = raw_data[:, 1:4] #分割数据，X包含坐标位置、初始剪切和初始值
Y = raw_data[:, 0] #分割数据，Y包含测量的高度
X_Val = val_data[:, 1:4]
Y_Val = val_data[:, 0]
X_test = test[:, 1:4]
Y_test = test[:, 0]
scalar = MinMaxScaler()
#print(X_test)
#print(Y_test)
print(X)
print(Y)
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X) 
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.fit_transform(X_Val)
Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.fit_transform(X_test)
Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))

Y变量正常化得很好，但我认为X变量正常化时使用了整个数组而不是逐列进行的。

这些是模型用于进行预测的输入。

X=[0.94941569 0.         0.        ], Predicted=[0.02409407]
X=[0.95664225 0.         0.        ], Predicted=[0.02374389]
X=[0.93496738 0.         0.        ], Predicted=[0.02480936]
X=[0.94219233 0.         0.        ], Predicted=[0.02444912]
X=[0.92774402 0.         0.        ], Predicted=[0.02517468]
X=[0.92052067 0.         0.        ], Predicted=[0.02554525]
X=[0.91329892 0.         0.        ], Predicted=[0.02592104]
X=[0.90607877 0.         0.        ], Predicted=[0.02630214]
X=[0.89885863 0.         0.        ], Predicted=[0.02668868]
X=[0.89163848 0.         0.        ], Predicted=[0.02708073]
X=[0.88441994 0.         0.        ], Predicted=[0.0274783]
X=[0.87720299 0.         0.        ], Predicted=[0.02788144]

英文:

I am coding a machine learning algorithm using Keras and I need to normalize my data before feeding it through. I have 3 inputs organised into a 2d array with each column making up an input.

    import tensorflow as tf
    import keras
    import numpy as np
    from keras.models import Sequential
    from keras.layers import Dense, Activation, Dropout
    import matplotlib.pyplot as plt
    from sklearn.preprocessing import MinMaxScaler
    #Importing all the required modules
    raw_data = np.array([]) #Defining numpy array for training data
    val_data = np.array([]) #Defining numpy array for validation data
    test = np.array([]) #Defining numpy array for test data
    rawfilepath = r&#39;C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt&#39;
    valfilepath = r&#39;C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt&#39;
    testfilepath = r&#39;C:\Users\***\Desktop\***\h4t6usedforprediction.txt&#39; #Filepaths 
    raw_data = np.loadtxt(rawfilepath)
    val_data = np.loadtxt(valfilepath)
    test = np.loadtxt(testfilepath) #Loading contents of text files into their respective arrays
    X = raw_data[:, 1:4] #Splitting the data, X contains the coordinate position, initial shear and initial  
    Y = raw_data[:, 0] #Splitting the data, Y contains the measured height
    X_Val = val_data[:, 1:4]
    Y_Val = val_data[:, 0]
    X_test = test[:, 1:4]
    Y_test = test[:, 0]
    scalar = MinMaxScaler()
    #print(X_test)
    #print(Y_test)
    print(X)
    print(Y)
    scaler = MinMaxScaler()
    Xnorm = scaler.fit_transform(X) 
    Ynorm = scaler.fit_transform(Y.reshape(-1,1))
    Xvalnorm = scaler.fit_transform(X_Val)
    Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
    Xtestnorm = scaler.fit_transform(X_test)
    Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))

The Y variables are normalising fine however I think the X variables are normalising with the whole array rather than column by column.

These are the inputs that the model is using to make predictions.

X=[0.94941569 0.         0.        ], Predicted=[0.02409407]
X=[0.95664225 0.         0.        ], Predicted=[0.02374389]
X=[0.93496738 0.         0.        ], Predicted=[0.02480936]
X=[0.94219233 0.         0.        ], Predicted=[0.02444912]
X=[0.92774402 0.         0.        ], Predicted=[0.02517468]
X=[0.92052067 0.         0.        ], Predicted=[0.02554525]
X=[0.91329892 0.         0.        ], Predicted=[0.02592104]
X=[0.90607877 0.         0.        ], Predicted=[0.02630214]
X=[0.89885863 0.         0.        ], Predicted=[0.02668868]
X=[0.89163848 0.         0.        ], Predicted=[0.02708073]
X=[0.88441994 0.         0.        ], Predicted=[0.0274783]
X=[0.87720299 0.         0.        ], Predicted=[0.02788144]

答案1

得分: 0

1 - 如果X和Y是你的训练集，对该集合调用fit_transform是正确的。但是你不能再次对你的验证和测试集合进行fit_transform操作。你只需使用之前定义的scaler来对它们进行transform操作：

scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X) 
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))

2 - 我假设你在最后发布的X的值已经经过归一化。因此，我创建了my_X只是为了演示如何使用sklearn来归一化一些数据：

my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)

只需将my_X中的值更改为你在X中拥有的值即可。

英文:

Let's do this by part:

1 - If Xand Y are you train set, calling fit_transform in that set is correct. But you can not fit_transform your validationand test sets again. You have to just transform them using the scaleryou have previously defined:

scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X) 
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))

2 - I am assuming the values of X you have posted at the end are already what you got from the normalization. So, i have created my_X just to exemplify to use sklearn to normalize some data:

my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31]])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)

Just change the values my_X for the values you have in your X.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用SKLearn按列规范化Python数组中的数据？

问题

答案1

如何在扫描电子显微镜图像中快速生成彩色像素而不是灰度像素的掩膜？

从Azure密钥保管库读取机密的Python脚本

创建一个向量（将多列合并成一个新列）pandas。

Azure ML experiment run failing with 'HttpLoggingPolicy' has no attribute 'DEFAULT_HEADERS_ALLOWLIST'

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。