用合成数据填充时间序列的Pandas数据框,使其形状与原始数据类似。

huangapple go评论76阅读模式
英文:

Fill time series pandas dataframe with synthetic data that has a similar shape as the original data

问题

I have a time series in pandas with a large gap in between, I would like to fill that gap with "synthetic" data that resembles the same shape and trend of the data that is existing.

Some of the methods that I've tried have been linear, cubic, spline interpolation, but the noise and general shape of the data is gone. It will pretty much just plot a line through all the nulls.

Below is a graph of the data. Is there any library that can create this data?

用合成数据填充时间序列的Pandas数据框,使其形状与原始数据类似。

英文:

I have a time series in pandas with a large gap in between, I would like to fill that gap with "synthetic" data that resembles the same shape and trend of the data that is existing.

Some of the methods that I've tried have been linear, cubic, spline interpolation, but the noise and general shape of the data is gone. It will pretty much just plot a line through all the nulls.

Below is a graph of the data. Is there any library that can create this data?

用合成数据填充时间序列的Pandas数据框,使其形状与原始数据类似。

答案1

得分: 1

你可以尝试使用Prophet创建一些未来预测,以填补缺失数据。这假设你的缺失数据是NaN而不是0,并且所有缺失数据是连续的。

这只是一个快速示例,你可能需要调整季节性来获得更好的拟合。

import pandas as pd
import numpy as np
from prophet import Prophet

# 示例数据
np.random.seed(0)
arr = np.random.randint(1, 200, 100)
df = pd.DataFrame(arr, columns=['y'])
df['ds'] = pd.date_range('2023-01-1', periods=100)
df.iloc[50:90, 0] = np.nan
og_df = df.copy()

# 找到第一个NaN并创建训练数据集
train = df.iloc[:df['y'].isna().idxmax()]
# 找到要预测的期数
periods = sum(df['y'].isna())

# 拟合模型,创建未来的DataFrame并预测
# 根据实际数据添加季节性以获得更好的拟合
m = Prophet()
m.fit(train)
future = m.make_future_dataframe(periods=periods)  # 如果不使用每日数据,添加 freq 参数:freq='1h'
forecast = m.predict(future)

# 将预测数据分配给原始框架
missing_data = forecast.iloc[df['y'].isna().idxmax():][['ds', 'yhat']].rename(columns={'yhat': 'y'})
df.loc[df['y'].isna()] = missing_data

# 示例图
og_df.plot(x='ds', y='y', ylim=(0,500))
df.plot(x='ds', y='y', ylim=(0,500))
英文:

You can try and use Prophet to create some future predictions to fill the missing data. This assuming that your missing data is NaN not 0 and that all the missing data is continuous.

This is just a quick example and you will probably need to adjust the seasonality to get a better fit.

import pandas as pd
import numpy as np
from prophet import Prophet


# sample data
np.random.seed(0)
arr = np.random.randint(1, 200, 100)
df = pd.DataFrame(arr, columns=['y'])
df['ds'] = pd.date_range('2023-01-1', periods=100)
df.iloc[50:90, 0] = np.nan
og_df = df.copy()

# find first nan and create a train dataset
train = df.iloc[:df['y'].isna().idxmax()]
# find the number of periods to predict
periods = sum(df['y'].isna())

# fit your model, create a future DataFrame, and forecast
# add seasonality based on your actual data to get a better fit
m = Prophet()
m.fit(train)
future = m.make_future_dataframe(periods=periods)  # add freq param if not using daily: freq='1h'
forecast = m.predict(future)

# assign your forecasted data to the original frame
missing_data = forecast.iloc[df['y'].isna().idxmax():][['ds', 'yhat']].rename(columns={'yhat': 'y'})
df.loc[df['y'].isna()] = missing_data

# sample plot
og_df.plot(x='ds', y='y', ylim=(0,500))
df.plot(x='ds', y='y', ylim=(0,500))

用合成数据填充时间序列的Pandas数据框,使其形状与原始数据类似。

答案2

得分: 0

如果您期望接下来的数据帧具有与现有数据相同的行为,那么您可以尝试使用TimeGAN生成数据来填补缺失的部分。您可以尝试使用ydata-synthetic来实现这一目标。

英文:

If you expect the next frames of data to have the same behavior as the existing data, then you can experiment with TimeGAN to generate data to replace the missing gap. You can give ydata-synthetic a try for this.

huangapple
  • 本文由 发表于 2023年5月11日 03:28:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76221977.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定