英文:
Understanding period parameter in statsmodel.tsa.seasonal
问题
关于你提到的关于STL
和seasonal_decompose
的period
参数的问题,你的理解是正确的。
-
对于
STL
,period
参数代表你的时间序列的期望季节性,例如,如果你的数据每天具有周期性,那么它可能是365,代表一年的周期性。 -
对于
seasonal_decompose
,period
参数则表示总样本数,不考虑时间分辨率。例如,如果你的样本是每小时采集的,那么period
应该是24,代表一天的总样本数。
你的错误是由于在seasonal_decompose
中使用了period=365
,但你的时间序列只有6天的数据,所以样本不足以支持365的周期性。
对于不均匀分布的样本,seasonal_decompose
可能会受到限制,因为它假定均匀分布的样本。你可以尝试对不均匀分布的样本进行一些预处理,例如重新采样或插值,以使其更适合seasonal_decompose
的期望。不过,这可能需要一些额外的工作。
关于代码中的部分,它确实将时间频率freq
映射到整数,表示每年的样本数。对于"日"频率("D")和"小时"频率("H")的映射,似乎是存在一些问题的。这可能是一个库的问题,你可以尝试查看库的文档或寻求支持来解决这个问题。
希望这些解释有帮助。如果你有更多问题,请随时提出。
英文:
So I am new to Time Series analysis, and want to check my data for seasonality and trend.
I tried using both STL
and seasonal_decompose
from statsmodels.tsa.seasonal
module.
My question is regarding an input parameter, in both cases called period
and described as:
> Periodicity of the sequence
for STL
case and
> Period of the series
for seasonal_decompose
.
However, for what I understood they are different.
Based on this answer, the STL
period
parameter is defined as the expected seasonality of my series, for a daily periodicity for example (which is my case) it would be 365.
However, for seasonal_decompose
I understand is the total number of samples regardless the time resolution. For example, if I have samples taken every hour, it would be 24
for my example case.
This was my conclusion based on the error I got when using seasonal_decompose
with period=365
on a timeseries of 6 days, for which I got:
ValueError: x must have 2 complete cycles requires 730 observations. x only has 238 observation(s)
Are they indeed different? Did I correctly understood both cases? And if I understood correctly. Would this imply that seasonal_decompose
cannot work for uneavenly spaced samples (in my case the samples are taken at a samewhat random date and time so the STL
parameter adapts much better. Is there a workaround for seasonal_decompose
on non-evenly distributed samples?
The more I read the less I understand. This code matches the sampling frequency to the period parameter. From the code docstring:
> Annual maps to 1, quarterly maps to 4, monthly to 12, weekly to 52.
Then it seems a map from sampling frequency freq
to in integer which means samples per year. So far so good until we see that it does:
elif freq == "D":
return 7
elif freq == "H":
return 24
So for a day it maps to a week frequency and for hours it maps to a day!
Please give me a hand here! I am compleately lost!
答案1
得分: 0
期间 可以定义为:每个完整周期/季节性组成重复中的预期样本数量。
基本上,您只需查看时间序列,看它重复自身所需的时间,然后获取在该时间段内的样本数。
对于将 freq
转换为周期的函数 来说,它只是"假设"如果您具有每小时的频率,它将每天再次重复该序列;如果您每天有频率采样,它将在一周内重复自身;如果频率大于这些情况,它将具有年度季节性。
英文:
Ok, I think I finally undestood, period could be defined as:
Expected samples in a full cycle / repetition of the seasonality component.
Basically you can just look at your time-series and see the time it takes to repeat itself, and then get the number of samples within that timeframe.
For the function that casts freq
to period, it just "assumes" that if you have hourly frequency it will repeat it sequence again daily, if you have a frequency sampling of days it will repeat itself in a week and if it's more that that it will have a yearly seasonality.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论