英文:
How to rescale new data base on old MinMaxScale?
问题
我卡在了缩放新数据的问题上。在我的方案中,我已经使用sklearn.MinMaxScaler()对所有x_train和x_test进行了训练和测试模型,并进行了缩放。然后,在实时流程中,如何将新输入按照训练和测试数据的相同比例进行缩放呢?
步骤如下:
featuresData = df[features].values # 包含数千个特征的数组
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
# 运行模型以生成最终模型
model.fit(X,Y)
model.predict(X_test)
# 保存到abcxyz.h5
然后,使用新数据实施如下:
# 加载模型abcxyz.h5
# 获取新数据
# 缩放新数据以输入到加载的模型 << 我卡在了这一步
# ...
那么,如何缩放新数据进行预测,然后逆转换为最终结果呢?从我的逻辑来看,需要按照训练模型之前的旧缩放器的方式进行缩放。
英文:
I'm stuck with the problem of scaling new data. In my scheme, I have trained and test the model, with all x_train and x_test have been scaled using sklearn.MinMaxScaler(). Then, applying to the real-time process, how can I scale the new input in the same scale of the training and testing data.
The step is as below
featuresData = df[features].values # Array of all features with the length of thousands
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
#Running model to make the final model
model.fit(X,Y)
model.predict(X_test)
#Saving to abcxyz.h5
Then implementing with new data
#load the model abcxyz.h5
#catching new data
#Scaling new data to put into the loaded model << I'm stucking in this step
#...
So how to scale the new data to predict then inverse transform to the final result? From my logic, it need to scale in the same manner of the old scaler before training the model.
答案1
得分: 10
从你使用scikit-learn的方式来看,你需要已经保存了变换器:
import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
joblib.dump(sc, 'sc.joblib')
# 使用新数据
sc = joblib.load('sc.joblib')
transformData = sc.transform(newData)
# ...
最佳的使用scikit-learn的方式是将转换与模型合并在一起。这样,你只需要保存包含转换流程的模型。
from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
clf = svm.SVC(kernel='linear')
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
model = Pipeline([('scaler', sc), ('svc', clf)])
#...
当你执行model.fit
时,首先模型会在底层执行fit_transform
来进行缩放。而在model.predict
中,将会涉及到缩放器的transform
。
英文:
From the way you used scikit-learn, you need to have had saved the transformer:
import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
joblib.dump(sc, 'sc.joblib')
# with new data
sc = joblib.load('sc.joblib')
transformData = sc.transform(newData)
# ...
The best way to use scikit-learn is merging your transformations with your model. That way, you only save your model that includes the transformation pipe.
from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
clf = svm.SVC(kernel='linear')
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
model = Pipeline([('scaler', sc), ('svc', clf)])
#...
When you do model.fit
, first the model will do fit_transform
for your scaler under the hood. With model.predict
, the transform
of your scaler will be involved.
答案2
得分: 1
考虑以下示例:
data1 = np.array([0, 1, 2, 3, 4, 5])
data2 = np.array([0, 2, 4, 6, 8, 10])
sc = MinMaxScaler()
sc.fit_transform(data1.reshape(-1, 1))
输出:
array([[0. ],
[0.2],
[0.4],
[0.6],
[0.8],
[1. ]])
第二个数据集在缩放后将给出相同的值:
sc.fit_transform(data2.reshape(-1, 1))
输出:
array([[0. ],
[0.2],
[0.4],
[0.6],
[0.8],
[1. ]])
让我们对第一个数据集进行拟合,并对第二个数据集使用相同的缩放器:
sc.fit(data1.reshape(-1, 1))
sc.transform(data2.reshape(-1, 1))
输出:
array([[0. ],
[0.4],
[0.8],
[1.2],
[1.6],
[2. ]])
英文:
Consider the following example:
data1 = np.array([0, 1, 2, 3, 4, 5])
data2 = np.array([0, 2, 4, 6, 8, 10])
sc = MinMaxScaler()
sc.fit_transform(data1.reshape(-1, 1))
Output:
array([[0. ],
[0.2],
[0.4],
[0.6],
[0.8],
[1. ]])
The second data set will give you the same values after scaling:
sc.fit_transform(data2.reshape(-1, 1))
Output:
array([[0. ],
[0.2],
[0.4],
[0.6],
[0.8],
[1. ]])
Let's fit on the first data set and use the same scaler for the second one:
sc.fit(data1.reshape(-1, 1))
sc.transform(data2.reshape(-1, 1))
Output:
array([[0. ],
[0.4],
[0.8],
[1.2],
[1.6],
[2. ]])
答案3
得分: 0
你应该使用 fit()
和 transform()
来执行以下操作:
# 假设你读取了实时数据作为 new_data
featuresData = df[features].values
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
new_data = sc.transform(new_data)
sc.transform
将会在新数据上应用与 featuresData 相同的缩放操作。
英文:
You should use fit()
and transform()
for do that as follows:
# Lets say you read real times data as new_data
featuresData = df[features].values
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
new_data = sc.transform(new_data)
sc.transform
will apply same scale on new_data which you applied on featuresData.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论