英文:
BigQuery ML forecast using ARIMA_PLUS (mostly) ignores the holiday effect
问题
以下是您要翻译的内容:
"I'm trying to forecast daily sales for a business using the past 5 years of daily sales history. This business has very clear holiday sales patterns, including zero sales on Easter/Thanksgiving/Christmas and unusually high sales on Valentine's Day, Mother's Day, etc.
Only two dates in the resulting model that have a non-zero holiday effect: December 23 and President's Day. I cannot figure out why.
I'm using the ARIMA_PLUS model type in BigQuery ML, with the holiday_region
set to 'US'. The data is simply date
and sales
.
An example of an obvious forecasting error is that April 9, 2023 (Easter) is forecasted as a normal sales day (instead of zero), and April 17, 2023 is forecasted to be zero. The business was closed on April 17, 2022 due to Easter, so the model is clearly ignoring the Easter holiday and providing an inaccurate forecast as a result.
I used the EXPLAIN_FORECAST
function to see if any days had a holiday effect, which is where I found it was just December 23 and President's Day. There were adjustments for those two days across each of the five years of history. The holiday_effect
field was zero for the rest of the dates.
Any idea why the model ignores all holidays except these two? Is this a defect in BigQuery ML, or is there something wrong with my setup?"
请告诉我如果您需要进一步的帮助。
英文:
I'm trying to forecast daily sales for a business using the past 5 years of daily sales history. This business has very clear holiday sales patterns, including zero sales on Easter/Thanksgiving/Christmas and unusually high sales on Valentine's Day, Mother's Day, etc.
Only two dates in the resulting model that have a non-zero holiday effect: December 23 and President's Day. I cannot figure out why.
I'm using the ARIMA_PLUS model type in BigQuery ML, with the holiday_region
set to 'US'. The data is simply date
and sales
.
CREATE OR REPLACE MODEL `model_name`
OPTIONS(MODEL_TYPE='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='sales',
data_frequency='DAILY',
holiday_region='US') AS
( SELECT date, sales FROM `table_name`);
An example of an obvious forecasting error is that April 9, 2023 (Easter) is forecasted as a normal sales day (instead of zero), and April 17, 2023 is forecasted to be zero. The business was closed on April 17, 2022 due to Easter, so the model is clearly ignoring the Easter holiday and providing an inaccurate forecast as a result.
I used the EXPLAIN_FORECAST
function to see if any days had a holiday effect, which is where I found it was just December 23 and President's Day. There were adjustments for those two days across each of the five years of history. The holiday_effect
field was zero for the rest of the dates.
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `model_name`, STRUCT(14 AS horizon, 0.8 AS confidence_level));
Any idea why the model ignores all holidays except these two? Is this a defect in BigQuery ML, or is there something wrong with my setup?
答案1
得分: 2
2023-04-17
的下降可能是由于YEARLY
季节性效应导致的。
在您的模型创建设置中,SEASONALITY
默认为AUTO
。鉴于您的数据频率是DAILY
,时间序列长度超过了2年
,它将自动推断SEASONALITY
为WEEKLY
和YEARLY
。确实是YEARLY
季节性捕捉到了这次从2022
年开始的下降。
为了解决这个问题,您可以在OPTIONS
子句中显式设置SEASONALITY = ['WEEKLY']
,以避免使用YEARLY
季节性建模。
英文:
The dip on 2023-04-17
might be a result of YEARLY
seasonality effect.
In your model creation setting, SEASONALITY
is default to AUTO
. Given that your data frequency is DAILY
and time series length is longer than 2 years
, it will automatically infer the SEASONALITY
as WEEKLY
and YEARLY
. It is indeed the YEARLY
seasonality that captures the dip from 2022
this time.
To workaround that, you can set SEASONALITY = ['WEEKLY']
in OPTIONS
clause explicitly to avoid modeling with YEARLY
seasonality.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论