Python列表中的数据框之间的分段线性插值

huangapple go评论90阅读模式
英文:

python piecewise linear interpolation across dataframes in a list

问题

我试图应用分段线性插值。我首先尝试使用pandas内置的插值函数,但它没有工作。

示例数据如下:

  1. import pandas as pd
  2. import numpy as np
  3. d = {'ID':[5,5,5,5,5,5,5], 'month':[0,3,6,9,12,15,18], 'num':[7,np.nan,5,np.nan,np.nan,5,8]}
  4. tempo = pd.DataFrame(data = d)
  5. d2 = {'ID':[6,6,6,6,6,6,6], 'month':[0,3,6,9,12,15,18], 'num':[5,np.nan,2,np.nan,np.nan,np.nan,7]}
  6. tempo2 = pd.DataFrame(data = d2)
  7. this = []
  8. this.append(tempo)
  9. this.append(tempo2)

实际数据有超过1000个唯一的ID,所以我将每个ID筛选到一个数据框中,然后将它们放入列表中。

列表中的第一个数据框如下所示:

Python列表中的数据框之间的分段线性插值

我试图遍历列表中的所有数据框以进行分段线性插值。我尝试将月份更改为索引并使用.interpolate(method='index', inplace=True),但它没有起作用。

预期输出是:

ID | month | num

5 | 0 | 7

5 | 3 | 6

5 | 6 | 5

5 | 9 | 5

5 | 12 | 5

5 | 15 | 5

5 | 18 | 8

这需要应用于列表中的所有数据框。

英文:

I am trying to apply piecewise linear interpolation. I first tried to use pandas built-in interpolate function but it was not working.

Example data looks below

  1. import pandas as pd
  2. import numpy as np
  3. d = {'ID':[5,5,5,5,5,5,5], 'month':[0,3,6,9,12,15,18], 'num':[7,np.nan,5,np.nan,np.nan,5,8]}
  4. tempo = pd.DataFrame(data = d)
  5. d2 = {'ID':[6,6,6,6,6,6,6], 'month':[0,3,6,9,12,15,18], 'num':[5,np.nan,2,np.nan,np.nan,np.nan,7]}
  6. tempo2 = pd.DataFrame(data = d2)
  7. this = []
  8. this.append(tempo)
  9. this.append(tempo2)

The actual data has over 1000 unique IDs, so I filtered each ID into a dataframe and put them into the list.

The first dataframe in the list looks as below

Python列表中的数据框之间的分段线性插值

I am trying to go through all the dataframe in the list to do a piecewise linear interpolation. I tried to change month to a index and use .interpolate(method='index', inplace = True) but it was not working.

The expected output is

ID | month | num

5 | 0 | 7

5 | 3 | 6

5 | 6 | 5

5 | 9 | 5

5 | 12 | 5

5 | 15 | 5

5 | 18 | 8

This needs to be applied across all the dataframes in the list.

答案1

得分: 2

Assuming this is a follow up of your previous question, change the code to:

  1. for i, df in enumerate(this):
  2. this[i] = (df
  3. .set_index('month')
  4. # optional, because of the previous question
  5. .reindex(range(df['month'].min(), df['month'].max()+3, 3))
  6. .interpolate()
  7. .reset_index()[df.columns]
  8. )

NB. I simplified the code to remove the groupby, which only works if you have a single group per DataFrame, as you mentioned in the other question.

Output:

  1. [ ID month num
  2. 0 5 0 7.0
  3. 1 5 3 6.0
  4. 2 5 6 5.0
  5. 3 5 9 5.0
  6. 4 5 12 5.0
  7. 5 5 15 5.0
  8. 6 5 18 8.0,
  9. ID month num
  10. 0 6 0 5.00
  11. 1 6 3 3.50
  12. 2 6 6 2.00
  13. 3 6 9 3.25
  14. 4 6 12 4.50
  15. 5 6 15 5.75
  16. 6 6 18 7.00]
英文:

Assuming this is a follow up of your previous question, change the code to:

  1. for i, df in enumerate(this):
  2. this[i] = (df
  3. .set_index('month')
  4. # optional, because of the previous question
  5. .reindex(range(df['month'].min(), df['month'].max()+3, 3))
  6. .interpolate()
  7. .reset_index()[df.columns]
  8. )

NB. I simplified the code to remove the groupby, which only works if you have a single group per DataFrame, as you mentioned in the other question.

Output:

  1. [ ID month num
  2. 0 5 0 7.0
  3. 1 5 3 6.0
  4. 2 5 6 5.0
  5. 3 5 9 5.0
  6. 4 5 12 5.0
  7. 5 5 15 5.0
  8. 6 5 18 8.0,
  9. ID month num
  10. 0 6 0 5.00
  11. 1 6 3 3.50
  12. 2 6 6 2.00
  13. 3 6 9 3.25
  14. 4 6 12 4.50
  15. 5 6 15 5.75
  16. 6 6 18 7.00]

huangapple
  • 本文由 发表于 2023年2月10日 13:24:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75407266.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定