
huangapple go评论61阅读模式

Fill time series pandas dataframe with synthetic data that has a similar shape as the original data


I have a time series in pandas with a large gap in between, I would like to fill that gap with "synthetic" data that resembles the same shape and trend of the data that is existing.

Some of the methods that I've tried have been linear, cubic, spline interpolation, but the noise and general shape of the data is gone. It will pretty much just plot a line through all the nulls.

Below is a graph of the data. Is there any library that can create this data?



I have a time series in pandas with a large gap in between, I would like to fill that gap with "synthetic" data that resembles the same shape and trend of the data that is existing.

Some of the methods that I've tried have been linear, cubic, spline interpolation, but the noise and general shape of the data is gone. It will pretty much just plot a line through all the nulls.

Below is a graph of the data. Is there any library that can create this data?



得分: 1



import pandas as pd
import numpy as np
from prophet import Prophet

# 示例数据
arr = np.random.randint(1, 200, 100)
df = pd.DataFrame(arr, columns=['y'])
df['ds'] = pd.date_range('2023-01-1', periods=100)
df.iloc[50:90, 0] = np.nan
og_df = df.copy()

# 找到第一个NaN并创建训练数据集
train = df.iloc[:df['y'].isna().idxmax()]
# 找到要预测的期数
periods = sum(df['y'].isna())

# 拟合模型,创建未来的DataFrame并预测
# 根据实际数据添加季节性以获得更好的拟合
m = Prophet()
future = m.make_future_dataframe(periods=periods)  # 如果不使用每日数据,添加 freq 参数:freq='1h'
forecast = m.predict(future)

# 将预测数据分配给原始框架
missing_data = forecast.iloc[df['y'].isna().idxmax():][['ds', 'yhat']].rename(columns={'yhat': 'y'})
df.loc[df['y'].isna()] = missing_data

# 示例图
og_df.plot(x='ds', y='y', ylim=(0,500))
df.plot(x='ds', y='y', ylim=(0,500))

You can try and use Prophet to create some future predictions to fill the missing data. This assuming that your missing data is NaN not 0 and that all the missing data is continuous.

This is just a quick example and you will probably need to adjust the seasonality to get a better fit.

import pandas as pd
import numpy as np
from prophet import Prophet

# sample data
arr = np.random.randint(1, 200, 100)
df = pd.DataFrame(arr, columns=['y'])
df['ds'] = pd.date_range('2023-01-1', periods=100)
df.iloc[50:90, 0] = np.nan
og_df = df.copy()

# find first nan and create a train dataset
train = df.iloc[:df['y'].isna().idxmax()]
# find the number of periods to predict
periods = sum(df['y'].isna())

# fit your model, create a future DataFrame, and forecast
# add seasonality based on your actual data to get a better fit
m = Prophet()
future = m.make_future_dataframe(periods=periods)  # add freq param if not using daily: freq='1h'
forecast = m.predict(future)

# assign your forecasted data to the original frame
missing_data = forecast.iloc[df['y'].isna().idxmax():][['ds', 'yhat']].rename(columns={'yhat': 'y'})
df.loc[df['y'].isna()] = missing_data

# sample plot
og_df.plot(x='ds', y='y', ylim=(0,500))
df.plot(x='ds', y='y', ylim=(0,500))



得分: 0



If you expect the next frames of data to have the same behavior as the existing data, then you can experiment with TimeGAN to generate data to replace the missing gap. You can give ydata-synthetic a try for this.

  • 本文由 发表于 2023年5月11日 03:28:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76221977.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
