2023年2月7日 03:02:39go评论97阅读模式

英文:

Add together elements from Pandas DataFrame based on timestamp

问题

我正在尝试将两个数据框中第二列的元素相加，其中时间（在第一列中）相同，但每个数据框中的时间间隔不同。因此，在下面的图像中，我想将两条线的y值相加：

enter image description here

因此，在它们重叠的地方，合并值将大约为3200。

每个数据框都有两列：第一列是Unix时间戳中的时间，第二列是瓦特数，每行之间的间隔通常为6秒，但有时更多或更少。此外，每个数据框的开始和结束时间不同，尽管在内部部分存在一些重叠。

为了方便查看，我首先添加了前几行：

df1:

             time  power
0      1355526770  1500
1      1355526776  1800
2      1355526782  1600
3      1355526788  1700
4      1355526794  1400

df2:

             time  power
0      1355526771  1250
1      1355526777  1200
2      1355526783  1280
3      1355526789  1290
4      1355526795  1300

我最初考虑重新索引每个数据框，在每个数据框的时间范围内的每一秒插入一行，然后在线性插值每个时间点之间的功率值数据。然后，我将通过精确匹配的时间戳来将数据框相加。

这种方法的问题在于它会使每个数据框的大小至少增加6倍，而且由于它们已经相当大，这将大大减慢速度。

如果有人知道另一种方法来做这个，我将不胜感激。

英文:

I am trying to add together elements in the second column from from two dataframes where the time(in the first column) is the same, however the time in each DataFrame is spaced at different intervals. So, in the image below, I would like to add the y values of both lines together:
enter image description here
So where they overlap, the combined value would be at around 3200.

Each dataframe is two columns: first one is time in unix timestamp, and the second column is power in watts, and the spacing between each row is usually 6 seconds, but sometimes more or less. Also, each dataframe starts and ends at a different time, although there is some overlap in the inner portion.
I've added the first few rows for ease of viewing:
df1:

             time  power
0      1355526770  1500
1      1355526776  1800
2      1355526782  1600
3      1355526788  1700
4      1355526794  1400

df2:

             time  power
0      1355526771  1250
1      1355526777  1200
2      1355526783  1280
3      1355526789  1290
4      1355526795  1300

I first though to reindex each dataframe inserting a row for every second across the time range of each df, and then linearly interpolating the power value data between each time. Then I would add together the dataframes by adding the power value where the timestamp matched exactly.
The problem with this method is that it would increase the size of each dataframe by at least 6x, and since they're already pretty big, this would slow things down a lot.
If anyone knows another method to do this I would be very grateful.

答案1

得分: 1

除了其他用户所说的，如果你想要另一种加速计算等方式，你还可以考虑尝试使用 Modin 而不是纯粹的 pandas 处理你的数据集。Modin 只需一行代码就可以轻松集成到你的系统中。请查看这里：Intel® Distribution of Modin

英文:

Beyond what the other users have said, you could also consider trying out Modin instead of pure pandas for your datasets if you want another way to speed up computation and so forth. Modin is easily integrated with your system with just one line of code. Take a look here: Intel® Distribution of Modin

答案2

得分: 0

使用 merge_asof 在最近的时间上对齐：

(pd.merge_asof(df1, df2, on='time', direction='nearest', suffixes=(None, '_2'))
   .assign(power=lambda d: d['power'].add(d.pop('power_2')))
)

输出结果：

         time  power
0  1355526770   2750
1  1355526776   3000
2  1355526782   2880
3  1355526788   2990
4  1355526794   2700

英文:

Using a merge_asof to align on the nearest time:

(pd.merge_asof(df1, df2, on=&#39;time&#39;, direction=&#39;nearest&#39;, suffixes=(None, &#39;_2&#39;))
   .assign(power=lambda d: d[&#39;power&#39;].add(d.pop(&#39;power_2&#39;)))
)

Output:

         time  power
0  1355526770   2750
1  1355526776   3000
2  1355526782   2880
3  1355526788   2990
4  1355526794   2700

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据时间戳将Pandas DataFrame 中的元素相加。

问题

答案1

答案2

Python 打印不重复的随机行

连接/根据列值将字符串重复连接X次，以字符”|”（管道）分隔。

How do I make a for loop that has range in it loop back to the top of the loop without changing the range variable? I have code

Polars arr.to_struct() throws "pyo3_runtime.PanicException: not implemented for dtype Unknown" exception

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。