2023年3月7日 13:40:14go评论73阅读模式

英文:

Pandas/matplotlib newbie: aggregating time series data with differing indices?

问题

我正在学习使用pandas/matplotlib，并试图合并具有（稍微）不同索引的多个数据系列。例如：

系列1

从开始以秒计	值
0.0	35
0.8	41
1.1	48

系列2

从开始以秒计	值
0.0	31
0.7	37
1.1	41

目前，我将两个系列绘制为两个单独的线图。最终，我希望创建一条线，显示对于任何给定的x值，两个系列的平均y值。可以假定指定点之间的值是线性的。

我假设这是一个常见的任务，但我尝试的方法涉及的复杂性比我认为是必要的要多。

简而言之：是否有一种简单的方式可以绘制具有不同索引值的系列的均值？

注：

虽然唯一的即时需求是绘图，但理想情况下，聚合应该在pandas中计算，而不是在matplotlib中。
解决方案将聚合超过100个不同的系列，而不仅仅是2个。

英文:

I'm getting to grips with pandas/matplotlib, and looking to aggregate multiple data series with (marginally) differing indices. For example:

Series 1

seconds_since_start	Value
0.0	35
0.8	41
1.1	48

Series 2

seconds_since_start	Value
0.0	31
0.7	37
1.1	41

At present, I'm plotting both series as 2 separate line graphs. Ultimately, I'm looking to create a single line that shows, for any given x value, the mean y of both series. The values between specified points can be assumed to be linear.

I assume this is a common task, but the ways I'm trying involve a lot more complexity than I suspect is necessary.

In short: is there a straightforward way in plot the mean for series that have differing index values?

Notes:

While the only immediate need is graphing, ideally the aggregation would be calculated in pandas, not matplotlib
The solution will aggregate >100 different series, not just 2

答案1

得分: 1

一个解决方法是找到系列索引的并集，并对任何缺失的值进行插值。然后，可以将这些系列连接在一起，并计算每个索引的均值。下面的代码假定系列在名为series的列表中。

首先，获取索引的并集：

from functools import reduce

new_index = reduce(np.union1d, 展开收缩)

在示例情况下，new_index 将是 array([0. , 0.7, 0.8, 1.1])。

现在，重新索引系列并将它们连接在一起：

df = pd.concat(展开收缩
, axis=1)
df = df.interpolate('linear')
df['Avg'] = df.mean(axis=1)

结果：

                     Value_0  Value_1   Avg
seconds_since_start                        
0.0                     35.0     31.0  33.0
0.7                     38.0     37.0  37.5
0.8                     41.0     39.0  40.0
1.1                     48.0     41.0  44.5

英文:

One solution is to find the union of the series indices and interpolate the values for any missing ones.
Then the series can be concatenated together and the mean value of each index can be computed. the code below assumes the series are inside a list called series.

First, get the union of the indices:

from functools import reduce

new_index = reduce(np.union1d, 展开收缩)

In the example case, new_index will be array([0. , 0.7, 0.8, 1.1]).

Now, reindex the series and concat them together:

df = pd.concat(展开收缩
, axis=1)
df = df.interpolate(&#39;linear&#39;)
df[&#39;Avg&#39;] = df.mean(axis=1)

Result:

                     Value_0  Value_1   Avg
seconds_since_start                        
0.0                     35.0     31.0  33.0
0.7                     38.0     37.0  37.5
0.8                     41.0     39.0  40.0
1.1                     48.0     41.0  44.5

答案2

得分: 0

你可以使用 pd.concat 来聚合你的100多个系列，然后按 seconds_since_start 进行分组，然后计算平均值：

dfs = [df1, df2]  # 这里包含了所有你的数据
df = pd.concat(dfs, axis=0).groupby('seconds_since_start', as_index=False)['Value'].mean()
df.plot(x='seconds_since_start', y='Value', marker='o')

输出结果：

>> df
   seconds_since_start  Value
0                  0.0   33.0
1                  0.7   37.0
2                  0.8   41.0
3                  1.1   44.5

英文:

You can use pd.concat to aggregate your 100+ series then group by seconds_since_start before compute the mean:

dfs = [df1, df2]  # all your data here
df = pd.concat(dfs, axis=0).groupby(&#39;seconds_since_start&#39;, as_index=False)[&#39;Value&#39;].mean()
df.plot(x=&#39;seconds_since_start&#39;, y=&#39;Value&#39;, marker=&#39;o&#39;)

Output:

&gt;&gt;&gt; df
   seconds_since_start  Value
0                  0.0   33.0
1                  0.7   37.0
2                  0.8   41.0
3                  1.1   44.5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas/matplotlib初学者：如何汇总具有不同索引的时间序列数据？

问题

答案1

答案2

生成一个 pandas 数据框的直方图，其中列是各个箱子。

在Python中等价于R中的geosphere::distGeo的函数是：

Python嵌套的if与and行为

“Got an unexpected keyword argument ‘skiprows'”

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论