英文:
python piecewise linear interpolation across dataframes in a list
问题
我试图应用分段线性插值。我首先尝试使用pandas内置的插值函数,但它没有工作。
示例数据如下:
import pandas as pd
import numpy as np
d = {'ID':[5,5,5,5,5,5,5], 'month':[0,3,6,9,12,15,18], 'num':[7,np.nan,5,np.nan,np.nan,5,8]}
tempo = pd.DataFrame(data = d)
d2 = {'ID':[6,6,6,6,6,6,6], 'month':[0,3,6,9,12,15,18], 'num':[5,np.nan,2,np.nan,np.nan,np.nan,7]}
tempo2 = pd.DataFrame(data = d2)
this = []
this.append(tempo)
this.append(tempo2)
实际数据有超过1000个唯一的ID,所以我将每个ID筛选到一个数据框中,然后将它们放入列表中。
列表中的第一个数据框如下所示:
我试图遍历列表中的所有数据框以进行分段线性插值。我尝试将月份更改为索引并使用.interpolate(method='index', inplace=True)
,但它没有起作用。
预期输出是:
ID | month | num
5 | 0 | 7
5 | 3 | 6
5 | 6 | 5
5 | 9 | 5
5 | 12 | 5
5 | 15 | 5
5 | 18 | 8
这需要应用于列表中的所有数据框。
英文:
I am trying to apply piecewise linear interpolation. I first tried to use pandas built-in interpolate function but it was not working.
Example data looks below
import pandas as pd
import numpy as np
d = {'ID':[5,5,5,5,5,5,5], 'month':[0,3,6,9,12,15,18], 'num':[7,np.nan,5,np.nan,np.nan,5,8]}
tempo = pd.DataFrame(data = d)
d2 = {'ID':[6,6,6,6,6,6,6], 'month':[0,3,6,9,12,15,18], 'num':[5,np.nan,2,np.nan,np.nan,np.nan,7]}
tempo2 = pd.DataFrame(data = d2)
this = []
this.append(tempo)
this.append(tempo2)
The actual data has over 1000 unique IDs, so I filtered each ID into a dataframe and put them into the list.
The first dataframe in the list looks as below
I am trying to go through all the dataframe in the list to do a piecewise linear interpolation. I tried to change month to a index and use .interpolate(method='index', inplace = True) but it was not working.
The expected output is
ID | month | num
5 | 0 | 7
5 | 3 | 6
5 | 6 | 5
5 | 9 | 5
5 | 12 | 5
5 | 15 | 5
5 | 18 | 8
This needs to be applied across all the dataframes in the list.
答案1
得分: 2
Assuming this is a follow up of your previous question, change the code to:
for i, df in enumerate(this):
this[i] = (df
.set_index('month')
# optional, because of the previous question
.reindex(range(df['month'].min(), df['month'].max()+3, 3))
.interpolate()
.reset_index()[df.columns]
)
NB. I simplified the code to remove the groupby, which only works if you have a single group per DataFrame, as you mentioned in the other question.
Output:
[ ID month num
0 5 0 7.0
1 5 3 6.0
2 5 6 5.0
3 5 9 5.0
4 5 12 5.0
5 5 15 5.0
6 5 18 8.0,
ID month num
0 6 0 5.00
1 6 3 3.50
2 6 6 2.00
3 6 9 3.25
4 6 12 4.50
5 6 15 5.75
6 6 18 7.00]
英文:
Assuming this is a follow up of your previous question, change the code to:
for i, df in enumerate(this):
this[i] = (df
.set_index('month')
# optional, because of the previous question
.reindex(range(df['month'].min(), df['month'].max()+3, 3))
.interpolate()
.reset_index()[df.columns]
)
NB. I simplified the code to remove the groupby, which only works if you have a single group per DataFrame, as you mentioned in the other question.
Output:
[ ID month num
0 5 0 7.0
1 5 3 6.0
2 5 6 5.0
3 5 9 5.0
4 5 12 5.0
5 5 15 5.0
6 5 18 8.0,
ID month num
0 6 0 5.00
1 6 3 3.50
2 6 6 2.00
3 6 9 3.25
4 6 12 4.50
5 6 15 5.75
6 6 18 7.00]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论