2023年8月10日 17:00:49go评论110阅读模式

英文:

The prediction accuracies resulted from random forest regression models change each time I run the model

问题

每次我从头运行RF模型时，我都会得到不同的准确性。我已经运行了以下代码：

df17_tmp1 = df17_tmp.sample(frac=6, replace=True).reset_index(drop=True)
x_3d = df17_tmp1[col_in_3d]  # 特征
y_3d = df17_tmp1['over/under_exc_vol(m3)'].values  # 目标
x_train_3d, x_test_3d, y_train_3d, y_test_3d = train_test_split(x_3d, y_3d, test_size=0.3, random_state=42)
x_train_3d = x_train_3d.fillna(0).reset_index(drop=True)
x_test_3d = x_test_3d.fillna(0).reset_index(drop=True)
y_train_3d[np.isnan(y_train_3d)] = 0
y_test_3d[np.isnan(y_test_3d)] = 0
rf_3d = RandomForestRegressor(n_estimators=70, random_state=42)
rf_3d.fit(x_train_3d, y_train_3d)
prediction_3d = rf_3d.predict(x_test_3d)
mse_3d = mean_squared_error(y_test_3d, prediction_3d)
rmse_3d = mse_3d**0.5
abs_diff_3d = np.array(np.abs((y_test_3d - prediction_3d) / y_test_3d))
abs_diff_3d = abs_diff_3d[~np.isinf(abs_diff_3d)]
mape_3d = np.nanmean(abs_diff_3d) * 100
accuracy_3d = 100 - mape_3d

我在准确性方面得到了以下结果：

85.94 /
85.71 /
85.83 /
82.64 /
86.56 /
85.24 /
83.40 /
82.39 /
84.98 /
83.81 /

那么，这种情况正常吗？应该考虑哪个准确性？

在机器学习中，模型的性能可以因多种因素而异，因此在每次运行时得到不同的准确性是正常的。这些因素可能包括数据的随机性、模型初始化的不同、数据的分割方式等。

要选择哪个准确性作为最终评估，可以考虑以下几点：

如果你只关心模型的平均性能，可以计算这些准确性的平均值，然后将其视为模型的性能指标。
如果对于特定应用，某个准确性值更重要，可以根据该应用的需求选择相应的准确性值。
你还可以考虑使用交叉验证来更稳定地评估模型性能，以减少随机性的影响。

总之，不同运行时得到不同的准确性是正常的，最终选择哪个准确性值应基于你的具体需求和应用背景。

英文:

Every time I run the RF model from the begining I got different accuracies I have run the following code:

df17_tmp1 = df17_tmp.sample(frac=6, replace = True).reset_index(drop=True)
    
x_3d = df17_tmp1[col_in_3d] # Features;  
y_3d = df17_tmp1[&#39;over/under_exc_vol(m3)&#39;].values  # Target
   
# In[29]:
   
x_train_3d, x_test_3d, y_train_3d, y_test_3d = train_test_split(x_3d, y_3d, test_size = 0.3, random_state = 42)
   
# # train RF
# In[30]:
x_train_3d = x_train_3d.fillna(0).reset_index(drop = True)
x_test_3d = x_test_3d.fillna(0).reset_index(drop = True)
y_train_3d[np.isnan(y_train_3d)] = 0
y_test_3d[np.isnan(y_test_3d)] = 0
rf_3d = RandomForestRegressor(n_estimators = 70, random_state = 42)
rf_3d.fit(x_train_3d, y_train_3d)
# # Predict with RF and evaluate
# In[31]:
prediction_3d = rf_3d.predict(x_test_3d)
mse_3d = mean_squared_error(y_test_3d, prediction_3d)
rmse_3d = mse_3d**.5
abs_diff_3d = np.array(np.abs((y_test_3d - prediction_3d)/y_test_3d))
abs_diff_3d = abs_diff_3d[~np.isinf(abs_diff_3d)]
mape_3d = np.nanmean(abs_diff_3d)*100
accuracy_3d = 100 - mape_3d

I got the following results in terms accuracies:

85.94 /
85.71/
85.83 /
82.64 /
86.56 /
85.24 /
83.40 /
82.39 /
84.98 /
83.81 /

So, is that normal? and which accuracy should be considered?

答案1

得分: 0

尽管您在train_test_split()中设置了random_state以生成确定性的拆分，并在RandomForestRegressor()中控制了算法内的随机性，但差异是由于您在此处对数据框应用的随机抽样造成的：

df17_tmp1 = df17_tmp.sample(frac=6, replace=True).reset_index(drop=True)

您应该将上述行替换为以下内容：

df17_tmp1 = df17_tmp.sample(frac=6, replace=True, random_state=42).reset_index(drop=True)

以在每次运行时获得相同的输出。

请参考文档和这个线程以了解更多信息。

英文:

Although you set a random_state in your train_test_split() to generate a deterministic split and in the RandomForestRegressor()
which would control the randomness within the algorithm, the difference is occurring due to the random sampling you are applying to your dataframe here:

df17_tmp1 = df17_tmp.sample(frac=6, replace = True).reset_index(drop=True)

You should replace the above line with the following:

df17_tmp1 = df17_tmp.sample(frac=6, replace = True, random_state = 42).reset_index(drop=True)

to get the same output on every run.

Please refer to the documentation and this thread to learn more.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

随机森林回归模型的预测准确率在每次运行模型时都会发生变化。

问题

答案1

如何获取pytest装饰的测试函数名称和参数在pytest的装置中。

获取鼠标在Python中点击或按键按下时的位置

最好（最简单和优雅的）方式在长时间后向客户发送消息是什么？

两个进程实时从/向同一个文件进行读写

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。