2023年5月28日 04:58:15go评论96阅读模式

英文:

Do i need to use RobustScaler() and OneHotEncoder() in new data before model.predict()

问题

Question 1: 在使用 model.predict(a) 之前，我需要在 pred_list_example 上使用 RobustScaler() 和 OneHotEncoder() 吗？

Answer 1: 不需要。在使用 model.predict(a) 进行新数据预测时，通常不需要对新数据进行与训练数据相同的预处理，例如 RobustScaler() 和 OneHotEncoder()。这些预处理步骤通常是在训练过程中应用于训练数据的，以确保模型能够正确学习数据的特征和分布。对于新数据，您只需确保将其格式与训练数据一致，例如将类别数据编码为与训练时相同的方式（例如使用相同的 OneHotEncoder 编码），以便模型能够正确处理它们。

Question 2: 如果对前一个问题的答案是 "是"，由于 RobustScaler()，Var_to_predict 将被缩放。那么我是否需要使用 RobustScaler().inverse_transform 来获取预测的原始数值？

Answer 2: 如果您在训练模型时使用了 RobustScaler() 对目标变量 Var_to_predict 进行了缩放，并且希望在预测时获得原始数值，那么您需要使用 RobustScaler().inverse_transform 来还原预测值。您可以像这样执行：

scaled_prediction = model.predict(a)  # 进行预测
original_prediction = scaler.inverse_transform(scaled_prediction)  # 使用逆变换还原预测值

这将把预测值从缩放后的范围还原到原始范围，以便进行有意义的解释和使用。但请注意，这仅适用于对目标变量进行了缩放的情况。如果您的目标变量未进行缩放，那么不需要进行逆变换。

英文:

Suppose I have this dataframe (in a regression problem) with numerical and categorical data:

                                            df_example
Var1_numerical   Var2_categorical   Var3_numerical   Var4_categorical    Var_to_predict
    20                red            1                    BK                  352352
    10                blue           4                    BL                  345341
     5                orange         6                    BA                  423423
     1                red            3                    BK                  342342
    90                orange         2                    BK                  456456

So, in one part of the process I will use RobustScaler() on the numeric variables and OneHotEncoder() on the categorical variables so that the model can learn from these variables. And now I will have my model trained to predict with a certain error for that prediction.

The interesting thing is to predict on new data using model.predict()

pred_list_example=[15, red, 1, BK]
a = np.array(pred_list)
a = np.expand_dims(a, 0)
model.predict(a)

Question 1: Do I need to use RobustScaler() and OneHotEncoder() on pred_list_example before using model.predict(a)?

Question 2: In case the answer to the previous question is "yes", the Var_to_predict will be scaled due to RobustScaler(). Do I need to use RobustScaler().inverse_transform to get the original numeric value of the prediction?

答案1

得分: 1

问题1： 在使用model.predict(a)之前，我需要对pred_list_example使用RobustScaler()和OneHotEncoder()吗？

是的，不仅如此：您必须使用相同的RobustScaler()或OneHotEncoder()进行转换，否则它将不知道要按多少比例进行缩放或者您的单热编码类别的顺序是什么。

问题2： 如果对上一个问题的回答是“是”，那么由于RobustScaler()，Var_to_predict将被缩放。我需要使用RobustScaler().inverse_transform来获取预测的原始数值吗？

是的，尽管要注意一个细微之处：RobustScaler()需要一定数量的列，并且对每一列都按不同的比例进行缩放。这意味着没有简单的方法只将您的Y变量提供给它，并要求它取消对这个变量的转换。

因此，我建议使用两个RobustScaler()实例：一个用于您的X变量，另一个用于您的Y变量，这样您就可以在不需要X变量的情况下取消对预测的Y变量的缩放。

还有一个问题，即是否需要对Y变量进行缩放。有些人会说这是不必要的。您可以在这里阅读支持和反对的论点：这里.

英文:

>Question 1: Do I need to use RobustScaler() and OneHotEncoder() on pred_list_example before using model.predict(a)?

Yes, and more than that: you must use the same RobustScaler() or OneHotEncoder() to do the transformation, or it won't know how much to scale by or what order your one hot categories go in.

>Question 2: In case the answer to the previous question is "yes", the Var_to_predict will be scaled due to RobustScaler(). Do I need to use RobustScaler().inverse_transform to get the original numeric value of the prediction?

Yes, though note a subtlety: RobustScaler() requires a certain number of columns, and scales each one by a different amount. This means that there's no easy way to give it just your Y variable, and ask it to undo the transform on this one variable.

For this reason, I suggest having two RobustScaler() instances: one for your X variables and one for your Y variable, so that you can undo scaling on a predicted Y variable without having the X variables to go with it.

There is also the question of whether it is even needed to scale Y variables. Some people would say that it's not necessary. You can read a pro and con argument here.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我需要在 model.predict() 之前使用 RobustScaler() 和 OneHotEncoder() 吗？

问题

答案1

TypeError: LinearGradient.new() 需要精确的4个参数（只提供了2个）

suspicious/unfamiliar packages with anaconda and pip?

如何在 Python 字典中忽略输入的未定义键时执行操作：

有没有办法在 Python 中的另一个函数内终止一个函数

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。