Pandas/matplotlib初学者:如何汇总具有不同索引的时间序列数据?

huangapple go评论58阅读模式
英文:

Pandas/matplotlib newbie: aggregating time series data with differing indices?

问题

我正在学习使用pandas/matplotlib,并试图合并具有(稍微)不同索引的多个数据系列。例如:

系列1

从开始以秒计
0.0 35
0.8 41
1.1 48

系列2

从开始以秒计
0.0 31
0.7 37
1.1 41

目前,我将两个系列绘制为两个单独的线图。最终,我希望创建一条线,显示对于任何给定的x值,两个系列的平均y值。可以假定指定点之间的值是线性的。

我假设这是一个常见的任务,但我尝试的方法涉及的复杂性比我认为是必要的要多。

简而言之:是否有一种简单的方式可以绘制具有不同索引值的系列的均值?

注:

  • 虽然唯一的即时需求是绘图,但理想情况下,聚合应该在pandas中计算,而不是在matplotlib中。
  • 解决方案将聚合超过100个不同的系列,而不仅仅是2个。
英文:

I'm getting to grips with pandas/matplotlib, and looking to aggregate multiple data series with (marginally) differing indices. For example:

Series 1

seconds_since_start Value
0.0 35
0.8 41
1.1 48

Series 2

seconds_since_start Value
0.0 31
0.7 37
1.1 41

At present, I'm plotting both series as 2 separate line graphs. Ultimately, I'm looking to create a single line that shows, for any given x value, the mean y of both series. The values between specified points can be assumed to be linear.

I assume this is a common task, but the ways I'm trying involve a lot more complexity than I suspect is necessary.

In short: is there a straightforward way in plot the mean for series that have differing index values?

Notes:

  • While the only immediate need is graphing, ideally the aggregation would be calculated in pandas, not matplotlib
  • The solution will aggregate >100 different series, not just 2

答案1

得分: 1

一个解决方法是找到系列索引的并集,并对任何缺失的值进行插值。然后,可以将这些系列连接在一起,并计算每个索引的均值。下面的代码假定系列在名为series的列表中。

首先,获取索引的并集:

from functools import reduce

new_index = reduce(np.union1d, 
展开收缩
)

在示例情况下,new_index 将是 array([0. , 0.7, 0.8, 1.1])

现在,重新索引系列并将它们连接在一起:

df = pd.concat(
展开收缩
, axis=1)
df = df.interpolate('linear') df['Avg'] = df.mean(axis=1)

结果:

                     Value_0  Value_1   Avg
seconds_since_start                        
0.0                     35.0     31.0  33.0
0.7                     38.0     37.0  37.5
0.8                     41.0     39.0  40.0
1.1                     48.0     41.0  44.5
英文:

One solution is to find the union of the series indices and interpolate the values for any missing ones.
Then the series can be concatenated together and the mean value of each index can be computed. the code below assumes the series are inside a list called series.

First, get the union of the indices:

from functools import reduce

new_index = reduce(np.union1d, 
展开收缩
)

In the example case, new_index will be array([0. , 0.7, 0.8, 1.1]).

Now, reindex the series and concat them together:

df = pd.concat(
展开收缩
, axis=1)
df = df.interpolate('linear') df['Avg'] = df.mean(axis=1)

Result:

                     Value_0  Value_1   Avg
seconds_since_start                        
0.0                     35.0     31.0  33.0
0.7                     38.0     37.0  37.5
0.8                     41.0     39.0  40.0
1.1                     48.0     41.0  44.5

答案2

得分: 0

你可以使用 pd.concat 来聚合你的100多个系列,然后按 seconds_since_start 进行分组,然后计算平均值:

dfs = [df1, df2]  # 这里包含了所有你的数据
df = pd.concat(dfs, axis=0).groupby('seconds_since_start', as_index=False)['Value'].mean()
df.plot(x='seconds_since_start', y='Value', marker='o')

输出结果:

>> df
   seconds_since_start  Value
0                  0.0   33.0
1                  0.7   37.0
2                  0.8   41.0
3                  1.1   44.5

Pandas/matplotlib初学者:如何汇总具有不同索引的时间序列数据?

英文:

You can use pd.concat to aggregate your 100+ series then group by seconds_since_start before compute the mean:

dfs = [df1, df2]  # all your data here
df = pd.concat(dfs, axis=0).groupby('seconds_since_start', as_index=False)['Value'].mean()
df.plot(x='seconds_since_start', y='Value', marker='o')

Output:

>>> df
   seconds_since_start  Value
0                  0.0   33.0
1                  0.7   37.0
2                  0.8   41.0
3                  1.1   44.5

Pandas/matplotlib初学者:如何汇总具有不同索引的时间序列数据?

huangapple
  • 本文由 发表于 2023年3月7日 13:40:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75658389.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定