如何基于旧的MinMaxScale来重新调整新数据?

huangapple go评论77阅读模式
英文:

How to rescale new data base on old MinMaxScale?

问题

我卡在了缩放新数据的问题上。在我的方案中,我已经使用sklearn.MinMaxScaler()对所有x_train和x_test进行了训练和测试模型,并进行了缩放。然后,在实时流程中,如何将新输入按照训练和测试数据的相同比例进行缩放呢?
步骤如下:

featuresData = df[features].values # 包含数千个特征的数组
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

# 运行模型以生成最终模型
model.fit(X,Y)
model.predict(X_test)

# 保存到abcxyz.h5

然后,使用新数据实施如下:

# 加载模型abcxyz.h5
# 获取新数据
# 缩放新数据以输入到加载的模型 << 我卡在了这一步
# ...

那么,如何缩放新数据进行预测,然后逆转换为最终结果呢?从我的逻辑来看,需要按照训练模型之前的旧缩放器的方式进行缩放。

英文:

I'm stuck with the problem of scaling new data. In my scheme, I have trained and test the model, with all x_train and x_test have been scaled using sklearn.MinMaxScaler(). Then, applying to the real-time process, how can I scale the new input in the same scale of the training and testing data.
The step is as below

featuresData = df[features].values # Array of all features with the length of thousands
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

#Running model to make the final model
model.fit(X,Y)
model.predict(X_test)

#Saving to abcxyz.h5

Then implementing with new data

#load the model abcxyz.h5
#catching new data 
#Scaling new data to put into the loaded model &lt;&lt; I&#39;m stucking in this step
#...

So how to scale the new data to predict then inverse transform to the final result? From my logic, it need to scale in the same manner of the old scaler before training the model.

答案1

得分: 10

从你使用scikit-learn的方式来看,你需要已经保存了变换器:

import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

joblib.dump(sc, 'sc.joblib')

# 使用新数据
sc = joblib.load('sc.joblib')
transformData = sc.transform(newData)
# ...

最佳的使用scikit-learn的方式是将转换与模型合并在一起。这样,你只需要保存包含转换流程的模型。

from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline


clf = svm.SVC(kernel='linear')
sc = MinMaxScaler(feature_range=(-1,1), copy=False)

model = Pipeline([('scaler', sc), ('svc', clf)])

#...

当你执行model.fit时,首先模型会在底层执行fit_transform来进行缩放。而在model.predict中,将会涉及到缩放器的transform

英文:

From the way you used scikit-learn, you need to have had saved the transformer:

import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

joblib.dump(sc, &#39;sc.joblib&#39;) 

# with new data
sc = joblib.load(&#39;sc.joblib&#39;)
transformData = sc.transform(newData)
# ...

The best way to use scikit-learn is merging your transformations with your model. That way, you only save your model that includes the transformation pipe.

from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline


clf = svm.SVC(kernel=&#39;linear&#39;)
sc = MinMaxScaler(feature_range=(-1,1), copy=False)

model = Pipeline([(&#39;scaler&#39;, sc), (&#39;svc&#39;, clf)])

#...

When you do model.fit, first the model will do fit_transform for your scaler under the hood. With model.predict, the transform of your scaler will be involved.

答案2

得分: 1

考虑以下示例:

data1 = np.array([0, 1, 2, 3, 4, 5])
data2 = np.array([0, 2, 4, 6, 8, 10])

sc = MinMaxScaler()
sc.fit_transform(data1.reshape(-1, 1))

输出:

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

第二个数据集在缩放后将给出相同的值:

sc.fit_transform(data2.reshape(-1, 1))

输出:

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

让我们对第一个数据集进行拟合,并对第二个数据集使用相同的缩放器:

sc.fit(data1.reshape(-1, 1))
sc.transform(data2.reshape(-1, 1))

输出:

array([[0. ],
       [0.4],
       [0.8],
       [1.2],
       [1.6],
       [2. ]])
英文:

Consider the following example:

data1 = np.array([0, 1, 2, 3, 4, 5])
data2 = np.array([0, 2, 4, 6, 8, 10])

sc = MinMaxScaler()
sc.fit_transform(data1.reshape(-1, 1))

Output:

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

The second data set will give you the same values after scaling:

sc.fit_transform(data2.reshape(-1, 1))

Output:

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

Let's fit on the first data set and use the same scaler for the second one:

sc.fit(data1.reshape(-1, 1))
sc.transform(data2.reshape(-1, 1)) 

Output:

array([[0. ],
       [0.4],
       [0.8],
       [1.2],
       [1.6],
       [2. ]])

答案3

得分: 0

你应该使用 fit()transform() 来执行以下操作:

# 假设你读取了实时数据作为 new_data

featuresData = df[features].values
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
new_data = sc.transform(new_data)

sc.transform 将会在新数据上应用与 featuresData 相同的缩放操作。

英文:

You should use fit() and transform() for do that as follows:

# Lets say you read real times data as new_data

featuresData = df[features].values
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
new_data = sc.transform(new_data)

sc.transform will apply same scale on new_data which you applied on featuresData.

huangapple
  • 本文由 发表于 2020年1月3日 16:47:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/59575492.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定