2023年8月10日 14:50:29go评论147阅读模式

英文:

Aggregate measurements in an efficient manner

问题

我正在尝试将数据集中的测量值相加。我有每分钟的测量值，需要找到一整年每小时的总和。

这是我目前拥有的代码。它可以工作，但速度较慢。可能还有其他问题，但这是有道理的。

def aggregate_measurements(tvec, data, period):
    tvec_a = []
    data_a = []
    if period == 'hour':
        for i in range(0, len(tvec), 60):
            timecomp = tvec.iloc[i:i+60]
            datacomp = data.iloc[i:i+60]
            tvec_a.append(timecomp.iloc[0]['year':'second'])
            data_summeret = datacomp.sum()
            data_a.append(data_summeret)
    return tvec_a, data_a

是否有更好的方法来做这个？

英文:

Im trying to add together measurements from a data set. I have measurements from every minut, and i need to find the sum of every hour for a whole year.

This is what i have at the moment. It works but it is slow. There might be more problems with it, but it was what made sense.

def aggregate_measurements(tvec, data, period):
tvec_a = []
data_a = []
        if period == &#39;hour&#39;:
            for i in range(0, len(tvec), 60):
        
            timecomp = tvec.iloc[i:i+60]
            datacomp = data.iloc[i:i+60]
            tvec_a.append(timecomp.iloc[0][&#39;year&#39;:&#39;second&#39;])
            data_summeret = datacomp.sum()
            data_a.append(data_summeret)
        return tvec_a, data_a

Is there a better way to do this?

答案1

得分: 0

你应该尽可能使用矢量化操作。例如，使用groupby。

import pandas as pd
# 假设tvec是您DataFrame中的一个日期时间列。如果不是，请进行转换
df['hour'] = df['tvec'].dt.floor('H')  # 创建一个新列，包含时间戳的小时部分
hourly_data = df.groupby('hour')['data'].sum().reset_index()

dt.floor('H') 用于将时间戳向下舍入到最近的小时。

英文:

You should be using vectorized operations whenever possible. Like groupby

import pandas as pd
# Assuming tvec is a datetime column in your DataFrame. If not - convert
df[&#39;hour&#39;] = df[&#39;tvec&#39;].dt.floor(&#39;H&#39;)  # Create a new column with the hour component of the timestamp
hourly_data = df.groupby(&#39;hour&#39;)[&#39;data&#39;].sum().reset_index()

The dt.floor('H') is used to round down the timestamps to the nearest hour

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

以高效方式聚合测量数据

问题

答案1

Using SQLAlchemy, how to copy a database of three tables over to my local MySQL database? What are the proper sequence of method calls?

如何在列表中将它们分开？

在Flipkart产品上抓取评论时未能获取所有评论？

如何使用Qt和Python禁用SSL验证？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。