英文:
Aggregate measurements in an efficient manner
问题
我正在尝试将数据集中的测量值相加。我有每分钟的测量值,需要找到一整年每小时的总和。
这是我目前拥有的代码。它可以工作,但速度较慢。可能还有其他问题,但这是有道理的。
def aggregate_measurements(tvec, data, period):
tvec_a = []
data_a = []
if period == 'hour':
for i in range(0, len(tvec), 60):
timecomp = tvec.iloc[i:i+60]
datacomp = data.iloc[i:i+60]
tvec_a.append(timecomp.iloc[0]['year':'second'])
data_summeret = datacomp.sum()
data_a.append(data_summeret)
return tvec_a, data_a
是否有更好的方法来做这个?
英文:
Im trying to add together measurements from a data set. I have measurements from every minut, and i need to find the sum of every hour for a whole year.
This is what i have at the moment. It works but it is slow. There might be more problems with it, but it was what made sense.
def aggregate_measurements(tvec, data, period):
tvec_a = []
data_a = []
if period == 'hour':
for i in range(0, len(tvec), 60):
timecomp = tvec.iloc[i:i+60]
datacomp = data.iloc[i:i+60]
tvec_a.append(timecomp.iloc[0]['year':'second'])
data_summeret = datacomp.sum()
data_a.append(data_summeret)
return tvec_a, data_a
Is there a better way to do this?
答案1
得分: 0
你应该尽可能使用矢量化操作。例如,使用groupby。
import pandas as pd
# 假设tvec是您DataFrame中的一个日期时间列。如果不是,请进行转换
df['hour'] = df['tvec'].dt.floor('H') # 创建一个新列,包含时间戳的小时部分
hourly_data = df.groupby('hour')['data'].sum().reset_index()
dt.floor('H') 用于将时间戳向下舍入到最近的小时。
英文:
You should be using vectorized operations whenever possible. Like groupby
import pandas as pd
# Assuming tvec is a datetime column in your DataFrame. If not - convert
df['hour'] = df['tvec'].dt.floor('H') # Create a new column with the hour component of the timestamp
hourly_data = df.groupby('hour')['data'].sum().reset_index()
The dt.floor('H') is used to round down the timestamps to the nearest hour
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论