2023年2月18日 20:46:21go评论73阅读模式

英文:

XGBoost can't predict a simple sinusoidal function

问题

I created a very simple function to test XGBoost.

X is an array containing 1000 rows of "7np.pi" for each row.
Y is simply "1 + 0.5np.sin(x)".

I split the dataset in 800 training and 200 testing rows. Shuffle MUST be False to simulate future occurrences, making sure the last 200 rows are reserved to testing.

import numpy as np
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error as MSE
from xgboost import XGBRegressor

N = 1000                       # 1000 rows
x = np.linspace(0, 7*np.pi, N) # Simple function
y = 1 + 0.5*np.sin(x)          # Generate simple function sin(x) as y

# Train-test split, intentionally use shuffle=False to simulate time series
X = x.reshape(-1,1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)
### Interestingly, model generalizes well if shuffle=False
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)

XGB_reg = XGBRegressor(random_state=42)
XGB_reg.fit(X_train,y_train)

# EVALUATE ON TRAIN DATA
yXGBPredicted = XGB_reg.predict(X_train)
rmse = np.sqrt(MSE(y_train, yXGBPredicted))
print("RMSE TRAIN XGB: % f" %(rmse))

# EVALUATE ON TEST DATA
yXGBPredicted = XGB_reg.predict(X_test)
# METRICAS XGB
rmse = np.sqrt(MSE(y_test, yXGBPredicted))
print("RMSE TEST XGB: % f" %(rmse))

# Predict full dataset
yXGB = XGB_reg.predict(X)

# Plot and compare
plt.style.use('fivethirtyeight')
plt.rcParams.update({'font.size': 16})
fig, ax = plt.subplots(figsize=(10,5))
plt.plot(x, y)
plt.plot(x, yXGB)
plt.ylim(0,2)
plt.xlabel("x")
plt.ylabel("y")
plt.show()

I trained the model on the first 800 rows and then predicted the next 200 rows.

I was expecting testing data to have a great RMSE, but it did not happen.

I was surprised to see that XGBoost simply repeated the last value of the training set on all rows of the predictions (see chart).

Any ideas why this doesn't work?

XGBoost 无法预测一个简单的正弦函数。

英文:

I created a very simple function to test XGBoost.

X is an array containing 1000 rows of "7*np.pi" for each row.
Y is simply "1 + 0.5*np.sin(x)"

I split the dataset in 800 training and 200 testing rows. Shuffle MUST be False to simulate future occurrences, making sure the last 200 rows are reserved to testing.

import numpy as np
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt 
from sklearn.metrics import mean_squared_error as MSE
from xgboost import XGBRegressor

N = 1000                       # 1000 rows
x = np.linspace(0, 7*np.pi, N) # Simple function
y = 1 + 0.5*np.sin(x)          # Generate simple function sin(x) as y

# Train-test split, intentionally use shuffle=False to simulate time series
X = x.reshape(-1,1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)
### Interestingly, model generalizes well if shuffle=False
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)

XGB_reg = XGBRegressor(random_state=42)
XGB_reg.fit(X_train,y_train)

# EVALUATE ON TRAIN DATA
yXGBPredicted = XGB_reg.predict(X_train)
rmse = np.sqrt(MSE(y_train, yXGBPredicted))
print(&quot;RMSE TRAIN XGB: % f&quot; %(rmse))

# EVALUATE ON TEST DATA
yXGBPredicted = XGB_reg.predict(X_test)
# METRICAS XGB
rmse = np.sqrt(MSE(y_test, yXGBPredicted))
print(&quot;RMSE TEST XGB: % f&quot; %(rmse))

# Predict full dataset
yXGB = XGB_reg.predict(X)

# Plot and compare
plt.style.use(&#39;fivethirtyeight&#39;)
plt.rcParams.update({&#39;font.size&#39;: 16})
fig, ax = plt.subplots(figsize=(10,5))
plt.plot(x, y)
plt.plot(x, yXGB)
plt.ylim(0,2)
plt.xlabel(&quot;x&quot;)
plt.ylabel(&quot;y&quot;)
plt.show()

I trained the model on the first 800 rows and then predicted the next 200 rows.

I was expecting testing data to have a great RMSE, but it did not happen.

I was surprised to see that XGBoost simple repeated the last value of the training set on all rows of the predictions (see chart).

Any ideas why this doesn't work?

XGBoost 无法预测一个简单的正弦函数。

答案1

得分: 3

您要求模型进行“外推” - 对大于训练数据集中的x值进行预测。外推适用于某些模型类型（如线性模型），但通常不适用于决策树模型及其集成模型（如XGBoost）。

如果您从XGBoost切换到LightGBM，则可以使用“线性树”方法训练支持外推的决策树集成模型：

"您的XGBRegressor可能已经过拟合（具有n_estimators = 100和max_depth = 6）。如果您减小这些参数值，那么红线将显得更加锯齿状，您将更容易看到它“起作用”。

目前，如果您要求您的过拟合的XGBRegressor进行外推，那么它基本上会充当一个巨大的查找表。在外推到+Inf时，“最接近匹配”位于x = 17.5；在外推到-Inf时，“最接近匹配”位于x = 0.0。

英文:

You're asking your model to "extrapolate" - making predictions for x values that are greater than x values in the training dataset. Extrapolation works with some model types (such as linear models), but it typically does not work with decision tree models and their ensembles (such as XGBoost).

If you switch from XGBoost to LightGBM, then you can train extrapolation-capable decision tree ensembles using the "linear tree" approach:

> Any ideas why this doesn't work?

Your XGBRegressor is probably over-fitted (has n_estimators = 100 and max_depth = 6). If you decrease those parameter values, then the red line will appear more jagged, and it will be easier for you to see it "working".

Right now, if you ask your over-fitted XGBRegressor to extrapolate, then it basically functions as a giant look-up table. When extrapolating towards +Inf, then the "closest match" is at x = 17.5; when extrapolating towards -Inf, then the "closest match" is at x = 0.0.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

XGBoost 无法预测一个简单的正弦函数。

问题

答案1

修改柱状图中的X轴刻度。

我正在尝试将Java代码转换为Python代码。

无法打开使用pickle创建的文件。

Python错误 – AttributeError: ‘int’对象没有属性’find’

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论