英文:
Fill time series pandas dataframe with synthetic data that has a similar shape as the original data
问题
I have a time series in pandas with a large gap in between, I would like to fill that gap with "synthetic" data that resembles the same shape and trend of the data that is existing.
Some of the methods that I've tried have been linear, cubic, spline interpolation, but the noise and general shape of the data is gone. It will pretty much just plot a line through all the nulls.
Below is a graph of the data. Is there any library that can create this data?
英文:
I have a time series in pandas with a large gap in between, I would like to fill that gap with "synthetic" data that resembles the same shape and trend of the data that is existing.
Some of the methods that I've tried have been linear, cubic, spline interpolation, but the noise and general shape of the data is gone. It will pretty much just plot a line through all the nulls.
Below is a graph of the data. Is there any library that can create this data?
答案1
得分: 1
你可以尝试使用Prophet创建一些未来预测,以填补缺失数据。这假设你的缺失数据是NaN
而不是0
,并且所有缺失数据是连续的。
这只是一个快速示例,你可能需要调整季节性来获得更好的拟合。
import pandas as pd
import numpy as np
from prophet import Prophet
# 示例数据
np.random.seed(0)
arr = np.random.randint(1, 200, 100)
df = pd.DataFrame(arr, columns=['y'])
df['ds'] = pd.date_range('2023-01-1', periods=100)
df.iloc[50:90, 0] = np.nan
og_df = df.copy()
# 找到第一个NaN并创建训练数据集
train = df.iloc[:df['y'].isna().idxmax()]
# 找到要预测的期数
periods = sum(df['y'].isna())
# 拟合模型,创建未来的DataFrame并预测
# 根据实际数据添加季节性以获得更好的拟合
m = Prophet()
m.fit(train)
future = m.make_future_dataframe(periods=periods) # 如果不使用每日数据,添加 freq 参数:freq='1h'
forecast = m.predict(future)
# 将预测数据分配给原始框架
missing_data = forecast.iloc[df['y'].isna().idxmax():][['ds', 'yhat']].rename(columns={'yhat': 'y'})
df.loc[df['y'].isna()] = missing_data
# 示例图
og_df.plot(x='ds', y='y', ylim=(0,500))
df.plot(x='ds', y='y', ylim=(0,500))
英文:
You can try and use Prophet to create some future predictions to fill the missing data. This assuming that your missing data is NaN
not 0
and that all the missing data is continuous.
This is just a quick example and you will probably need to adjust the seasonality to get a better fit.
import pandas as pd
import numpy as np
from prophet import Prophet
# sample data
np.random.seed(0)
arr = np.random.randint(1, 200, 100)
df = pd.DataFrame(arr, columns=['y'])
df['ds'] = pd.date_range('2023-01-1', periods=100)
df.iloc[50:90, 0] = np.nan
og_df = df.copy()
# find first nan and create a train dataset
train = df.iloc[:df['y'].isna().idxmax()]
# find the number of periods to predict
periods = sum(df['y'].isna())
# fit your model, create a future DataFrame, and forecast
# add seasonality based on your actual data to get a better fit
m = Prophet()
m.fit(train)
future = m.make_future_dataframe(periods=periods) # add freq param if not using daily: freq='1h'
forecast = m.predict(future)
# assign your forecasted data to the original frame
missing_data = forecast.iloc[df['y'].isna().idxmax():][['ds', 'yhat']].rename(columns={'yhat': 'y'})
df.loc[df['y'].isna()] = missing_data
# sample plot
og_df.plot(x='ds', y='y', ylim=(0,500))
df.plot(x='ds', y='y', ylim=(0,500))
答案2
得分: 0
如果您期望接下来的数据帧具有与现有数据相同的行为,那么您可以尝试使用TimeGAN生成数据来填补缺失的部分。您可以尝试使用ydata-synthetic来实现这一目标。
英文:
If you expect the next frames of data to have the same behavior as the existing data, then you can experiment with TimeGAN to generate data to replace the missing gap. You can give ydata-synthetic a try for this.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论