2020年1月3日 16:47:04go评论87阅读模式

英文:

How to rescale new data base on old MinMaxScale?

问题

我卡在了缩放新数据的问题上。在我的方案中，我已经使用sklearn.MinMaxScaler()对所有x_train和x_test进行了训练和测试模型，并进行了缩放。然后，在实时流程中，如何将新输入按照训练和测试数据的相同比例进行缩放呢？
步骤如下：

featuresData = df[features].values # 包含数千个特征的数组
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

# 运行模型以生成最终模型
model.fit(X,Y)
model.predict(X_test)

# 保存到abcxyz.h5

然后，使用新数据实施如下：

# 加载模型abcxyz.h5
# 获取新数据
# 缩放新数据以输入到加载的模型 << 我卡在了这一步
# ...

那么，如何缩放新数据进行预测，然后逆转换为最终结果呢？从我的逻辑来看，需要按照训练模型之前的旧缩放器的方式进行缩放。

英文:

I'm stuck with the problem of scaling new data. In my scheme, I have trained and test the model, with all x_train and x_test have been scaled using sklearn.MinMaxScaler(). Then, applying to the real-time process, how can I scale the new input in the same scale of the training and testing data.
The step is as below

featuresData = df[features].values # Array of all features with the length of thousands
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

#Running model to make the final model
model.fit(X,Y)
model.predict(X_test)

#Saving to abcxyz.h5

Then implementing with new data

#load the model abcxyz.h5
#catching new data 
#Scaling new data to put into the loaded model &lt;&lt; I&#39;m stucking in this step
#...

So how to scale the new data to predict then inverse transform to the final result? From my logic, it need to scale in the same manner of the old scaler before training the model.

答案1

得分: 10

从你使用scikit-learn的方式来看，你需要已经保存了变换器：

import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

joblib.dump(sc, 'sc.joblib')

# 使用新数据
sc = joblib.load('sc.joblib')
transformData = sc.transform(newData)
# ...

最佳的使用scikit-learn的方式是将转换与模型合并在一起。这样，你只需要保存包含转换流程的模型。

from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline


clf = svm.SVC(kernel='linear')
sc = MinMaxScaler(feature_range=(-1,1), copy=False)

model = Pipeline([('scaler', sc), ('svc', clf)])

#...

当你执行model.fit时，首先模型会在底层执行fit_transform来进行缩放。而在model.predict中，将会涉及到缩放器的transform。

英文:

From the way you used scikit-learn, you need to have had saved the transformer:

import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

joblib.dump(sc, &#39;sc.joblib&#39;) 

# with new data
sc = joblib.load(&#39;sc.joblib&#39;)
transformData = sc.transform(newData)
# ...

The best way to use scikit-learn is merging your transformations with your model. That way, you only save your model that includes the transformation pipe.

from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline


clf = svm.SVC(kernel=&#39;linear&#39;)
sc = MinMaxScaler(feature_range=(-1,1), copy=False)

model = Pipeline([(&#39;scaler&#39;, sc), (&#39;svc&#39;, clf)])

#...

When you do model.fit, first the model will do fit_transform for your scaler under the hood. With model.predict, the transform of your scaler will be involved.

答案2

得分: 1

考虑以下示例：

data1 = np.array([0, 1, 2, 3, 4, 5])
data2 = np.array([0, 2, 4, 6, 8, 10])

sc = MinMaxScaler()
sc.fit_transform(data1.reshape(-1, 1))

输出：

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

第二个数据集在缩放后将给出相同的值：

sc.fit_transform(data2.reshape(-1, 1))

输出：

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

让我们对第一个数据集进行拟合，并对第二个数据集使用相同的缩放器：

sc.fit(data1.reshape(-1, 1))
sc.transform(data2.reshape(-1, 1))

输出：

array([[0. ],
       [0.4],
       [0.8],
       [1.2],
       [1.6],
       [2. ]])

英文:

Consider the following example:

data1 = np.array([0, 1, 2, 3, 4, 5])
data2 = np.array([0, 2, 4, 6, 8, 10])

sc = MinMaxScaler()
sc.fit_transform(data1.reshape(-1, 1))

Output:

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

The second data set will give you the same values after scaling:

sc.fit_transform(data2.reshape(-1, 1))

Output:

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8],
       [1. ]])

Let's fit on the first data set and use the same scaler for the second one:

sc.fit(data1.reshape(-1, 1))
sc.transform(data2.reshape(-1, 1))

Output:

array([[0. ],
       [0.4],
       [0.8],
       [1.2],
       [1.6],
       [2. ]])

答案3

得分: 0

你应该使用 fit() 和 transform() 来执行以下操作：

# 假设你读取了实时数据作为 new_data

featuresData = df[features].values
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
new_data = sc.transform(new_data)

sc.transform 将会在新数据上应用与 featuresData 相同的缩放操作。

英文:

You should use fit() and transform() for do that as follows:

# Lets say you read real times data as new_data

featuresData = df[features].values
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)
new_data = sc.transform(new_data)

sc.transform will apply same scale on new_data which you applied on featuresData.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何基于旧的MinMaxScale来重新调整新数据？

问题

答案1

答案2

答案3

如何使用Python将多个不连续的单元格添加到Excel中的名称管理器？

Python, 将字符串附加为值

Django Rest Framework在自定义的GET函数中获取经过筛选的查询集。

Levenshtein距离能防止密码滥用吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论