2023年6月18日 19:52:12go评论107阅读模式

英文:

How to interpolate monthly frequency sample data's missing values with interp1d(x, y) from scipy

问题

我已经创建了名为data的每月样本数据，其中某些月份存在缺失值，我希望使用interp1d()方法来填充它们。我已经用以下代码实现了它，但结果仍然为空，我不知道问题出在哪里。请问如何修改代码？非常感谢。

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d
# 创建一个示例DataFrame
data = pd.DataFrame({
    'value': [1.0, 1.2, np.nan, 1.4, 1.6, np.nan, 1.8, 2.0, np.nan, 2.2, 2.4, np.nan]
}, index=pd.date_range('2000-01-01', periods=12, freq='M'))
# 将索引转换为DateTimeIndex
data.index = pd.to_datetime(data.index)
# 将DateTimeIndex转换为具有月度频率的PeriodIndex
x = data.index.to_period('M')
# 将周期索引转换为整数
x = x.astype(int)
# 将'y'列转换为numpy数组
y = data['value'].values
# 创建插值函数
f = interp1d(x, y, kind='linear', fill_value="extrapolate")
# 创建一个布尔掩码，选择'value'列中的缺失值
mask = np.isnan(data['value'])
# 创建一个包含'y'缺失的'x'值的数组
x_new = pd.date_range(start=data.index.min(), end=data.index.max(), freq='M')[mask]
# 将'x_new'值转换为具有月度频率的日期
x_new_dates = pd.date_range(start=x_new.min(), end=x_new.max(), freq='M')
# 插值缺失的'y'值
y_new = f(x_new_dates.astype(int))
# 创建一个新列'value_interpolated'，并用原始数据填充它
# 将插值的'y'值插入新列
data.loc[x_new_dates, 'value_interpolated'] = y_new
# 打印DataFrame
print(data)

输出：

                value  value_interpolated
2000-01-31    1.0                 NaN
2000-02-29    1.2                 NaN
2000-03-31    NaN                 NaN
2000-04-30    1.4                 NaN
2000-05-31    1.6                 NaN
2000-06-30    NaN                 NaN
2000-07-31    1.8                 NaN
2000-08-31    2.0                 NaN
2000-09-30    NaN                 NaN
2000-10-31    2.2                 NaN
2000-11-30    2.4                 NaN
2000-12-31    NaN                 NaN

英文:

I have created monthly sample data data, in which there are missing values in some months, and I hope to fill them in by interp1d() method. I have implemented it with the following code, but the result is still empty, and I don’t know where the problem lies. May I ask how to modify the code? Many thanks.

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d
# Create an example DataFrame
data = pd.DataFrame({
     &#39;value&#39;: [1.0, 1.2, np.nan, 1.4, 1.6, np.nan, 1.8, 2.0, np.nan, 2.2, 2.4, np.nan]
}, index=pd.date_range(&#39;2000-01-01&#39;, periods=12, freq=&#39;M&#39;))
# Convert the index to a DateTimeIndex
data.index = pd.to_datetime(data.index)
# Convert the DateTimeIndex to a PeriodIndex with monthly frequency
x = data.index.to_period(&#39;M&#39;)
# Convert the period index to integers
x = x.astype(int)
# Convert the &#39;y&#39; column to a numpy array
y = data[&#39;value&#39;].values
# Create the interpolation function
f = interp1d(x, y, kind=&#39;linear&#39;, fill_value=&quot;extrapolate&quot;)
# Create a boolean mask that selects the missing values in the &#39;value&#39; column
mask = np.isnan(data[&#39;value&#39;])
# Create an array with the &#39;x&#39; values where &#39;y&#39; is missing
x_new = pd.date_range(start=data.index.min(), end=data.index.max(), freq=&#39;M&#39;)[mask]
# Convert the &#39;x_new&#39; values to dates with monthly frequency
x_new_dates = pd.date_range(start=x_new.min(), end=x_new.max(), freq=&#39;M&#39;)
# Interpolate the missing &#39;y&#39; values
y_new = f(x_new_dates. astype(int))
# Create a new column &#39;value_c&#39; and fill it with the original data
# Insert the interpolated &#39;y&#39; values into the new column
data.loc[x_new_dates, &#39;value_interpolated&#39;] = y_new
# Print the DataFrame
print(data)

Out:

            value  value_interpolated
2000-01-31    1.0                 NaN
2000-02-29    1.2                 NaN
2000-03-31    NaN                 NaN
2000-04-30    1.4                 NaN
2000-05-31    1.6                 NaN
2000-06-30    NaN                 NaN
2000-07-31    1.8                 NaN
2000-08-31    2.0                 NaN
2000-09-30    NaN                 NaN
2000-10-31    2.2                 NaN
2000-11-30    2.4                 NaN
2000-12-31    NaN                 NaN

答案1

得分: 1

以下是您要翻译的内容：

您可以使用与某个参考时间的秒数进行插值，如下所示的此答案中所示。由于存在大量缺失数据，我无法保证这些结果的准确性。

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d
data = pd.DataFrame({
    &quot;value&quot;: [1.0, 1.2, np.nan, 1.4, 1.6, np.nan, 1.8, 2.0, np.nan, 2.2, 2.4, np.nan]
}, index=pd.date_range(&quot;2000-01-01&quot;, periods=12, freq=&quot;M&quot;))
data.index = pd.to_datetime(data.index)
mask = ~np.isnan(data[&quot;value&quot;])     # mask out the missing values
dref = data.index[0]
x = (data.index-dref).total_seconds()[mask]
y = data[&quot;value&quot;][mask].to_numpy()
f = interp1d(x, y, fill_value=&quot;extrapolate&quot;)
y_new = f((data.index - dref).total_seconds())
data[&quot;value_interpolated&quot;] = y_new

输出：

            value  value_interpolated
2000-01-31    1.0            1.000000
2000-02-29    1.2            1.200000
2000-03-31    NaN            1.301639
2000-04-30    1.4            1.400000
2000-05-31    1.6            1.600000
2000-06-30    NaN            1.698361
2000-07-31    1.8            1.800000
2000-08-31    2.0            2.000000
2000-09-30    NaN            2.098361
2000-10-31    2.2            2.200000
2000-11-30    2.4            2.400000
2000-12-31    NaN            2.606667

英文:

You can interpolate the values using the seconds from some reference time (below I used the first date) as shown in this answer. I can't guarantee the accuracy of these results since there is a lot of missing data to interpolate.

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d
data = pd.DataFrame({
    &quot;value&quot;: [1.0, 1.2, np.nan, 1.4, 1.6, np.nan, 1.8, 2.0, np.nan, 2.2, 2.4, np.nan]
}, index=pd.date_range(&quot;2000-01-01&quot;, periods=12, freq=&quot;M&quot;))
data.index = pd.to_datetime(data.index)
mask = ~np.isnan(data[&quot;value&quot;])     # mask out the missing values
dref = data.index[0]
x = (data.index-dref).total_seconds()[mask]
y = data[&quot;value&quot;][mask].to_numpy()
f = interp1d(x, y, fill_value=&quot;extrapolate&quot;)
y_new = f((data.index - dref).total_seconds())
data[&quot;value_interpolated&quot;] = y_new

Out:

            value  value_interpolated
2000-01-31    1.0            1.000000
2000-02-29    1.2            1.200000
2000-03-31    NaN            1.301639
2000-04-30    1.4            1.400000
2000-05-31    1.6            1.600000
2000-06-30    NaN            1.698361
2000-07-31    1.8            1.800000
2000-08-31    2.0            2.000000
2000-09-30    NaN            2.098361
2000-10-31    2.2            2.200000
2000-11-30    2.4            2.400000
2000-12-31    NaN            2.606667

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用`scipy`中的`interp1d(x, y)`函数插值月度频率样本数据的缺失值

问题

答案1

如何按列表中的任何正则表达式模式匹配任何字符串列来过滤DataFrame？

Python：表达式中相等性/不等式的真实执行顺序？

如何从 _proxy 方法中返回一个事件？

-1数组索引？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。