英文:
Pandas/matplotlib newbie: aggregating time series data with differing indices?
问题
我正在学习使用pandas/matplotlib,并试图合并具有(稍微)不同索引的多个数据系列。例如:
系列1
从开始以秒计 | 值 |
---|---|
0.0 | 35 |
0.8 | 41 |
1.1 | 48 |
系列2
从开始以秒计 | 值 |
---|---|
0.0 | 31 |
0.7 | 37 |
1.1 | 41 |
目前,我将两个系列绘制为两个单独的线图。最终,我希望创建一条线,显示对于任何给定的x值,两个系列的平均y值。可以假定指定点之间的值是线性的。
我假设这是一个常见的任务,但我尝试的方法涉及的复杂性比我认为是必要的要多。
简而言之:是否有一种简单的方式可以绘制具有不同索引值的系列的均值?
注:
- 虽然唯一的即时需求是绘图,但理想情况下,聚合应该在pandas中计算,而不是在matplotlib中。
- 解决方案将聚合超过100个不同的系列,而不仅仅是2个。
英文:
I'm getting to grips with pandas/matplotlib, and looking to aggregate multiple data series with (marginally) differing indices. For example:
Series 1
seconds_since_start | Value |
---|---|
0.0 | 35 |
0.8 | 41 |
1.1 | 48 |
Series 2
seconds_since_start | Value |
---|---|
0.0 | 31 |
0.7 | 37 |
1.1 | 41 |
At present, I'm plotting both series as 2 separate line graphs. Ultimately, I'm looking to create a single line that shows, for any given x value, the mean y of both series. The values between specified points can be assumed to be linear.
I assume this is a common task, but the ways I'm trying involve a lot more complexity than I suspect is necessary.
In short: is there a straightforward way in plot the mean for series that have differing index values?
Notes:
- While the only immediate need is graphing, ideally the aggregation would be calculated in pandas, not matplotlib
- The solution will aggregate >100 different series, not just 2
答案1
得分: 1
一个解决方法是找到系列索引的并集,并对任何缺失的值进行插值。然后,可以将这些系列连接在一起,并计算每个索引的均值。下面的代码假定系列在名为series
的列表中。
首先,获取索引的并集:
from functools import reduce
new_index = reduce(np.union1d, 展开收缩)
在示例情况下,new_index
将是 array([0. , 0.7, 0.8, 1.1])
。
现在,重新索引系列并将它们连接在一起:
df = pd.concat(展开收缩, axis=1)
df = df.interpolate('linear')
df['Avg'] = df.mean(axis=1)
结果:
Value_0 Value_1 Avg
seconds_since_start
0.0 35.0 31.0 33.0
0.7 38.0 37.0 37.5
0.8 41.0 39.0 40.0
1.1 48.0 41.0 44.5
英文:
One solution is to find the union of the series indices and interpolate the values for any missing ones.
Then the series can be concatenated together and the mean value of each index can be computed. the code below assumes the series are inside a list called series
.
First, get the union of the indices:
from functools import reduce
new_index = reduce(np.union1d, 展开收缩)
In the example case, new_index
will be array([0. , 0.7, 0.8, 1.1])
.
Now, reindex
the series and concat
them together:
df = pd.concat(展开收缩, axis=1)
df = df.interpolate('linear')
df['Avg'] = df.mean(axis=1)
Result:
Value_0 Value_1 Avg
seconds_since_start
0.0 35.0 31.0 33.0
0.7 38.0 37.0 37.5
0.8 41.0 39.0 40.0
1.1 48.0 41.0 44.5
答案2
得分: 0
你可以使用 pd.concat
来聚合你的100多个系列,然后按 seconds_since_start
进行分组,然后计算平均值:
dfs = [df1, df2] # 这里包含了所有你的数据
df = pd.concat(dfs, axis=0).groupby('seconds_since_start', as_index=False)['Value'].mean()
df.plot(x='seconds_since_start', y='Value', marker='o')
输出结果:
>> df
seconds_since_start Value
0 0.0 33.0
1 0.7 37.0
2 0.8 41.0
3 1.1 44.5
英文:
You can use pd.concat
to aggregate your 100+ series then group by seconds_since_start
before compute the mean:
dfs = [df1, df2] # all your data here
df = pd.concat(dfs, axis=0).groupby('seconds_since_start', as_index=False)['Value'].mean()
df.plot(x='seconds_since_start', y='Value', marker='o')
Output:
>>> df
seconds_since_start Value
0 0.0 33.0
1 0.7 37.0
2 0.8 41.0
3 1.1 44.5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论