英文:
Add hours column to regular list of minutes, group by it, and average the data in Python
问题
我已查找类似的问题,但似乎没有解决以下挑战的答案。我有一个带有分钟和相应值的pandas数据帧,如下所示:
分钟 值
0 454
1 434
2 254
这个列表是一年的列表,因此有60分钟* 24小时* 365天=525600个观察值。
我想要添加一个名为"hour"的新列,用于表示一天中的小时(假设分钟0-59是上午12点,60-119是上午1点,依此类推,直到第二天,序列重新开始)。
然后,一旦添加了"hour"列,我想要按小时对观察值进行分组,并计算一年中每个小时的平均值,最终得到一个数据帧,其中包含24个观察值,每个观察值表示原始数据在每个小时n处的平均值。
英文:
I have looked for similar questions, but none seems to be addressing the following challenge. I have a pandas dataframe with a list of minutes and corresponding values, like the following:
minute value
0 454
1 434
2 254
The list is a year-long list, thus counting 60 minutes * 24 hours * 365 days = 525600 observations.
I would like to add a new column called hour, which indeed expresses the hour of the day (assuming minutes 0-59 are 12AM, 60-119 are 1AM, and so forth until the following day, where the sequence restarts).
Then, once the hour column is added, I would like to group observations by it and calculate the average value for every hour of the year, and end up with a dataframe with 24 observations, each expressing the average value of the original data at each hour n.
答案1
得分: 1
使用整数和余数除法,您可以获取小时。
df['hour'] = df['minute'] // 60 % 24
如果您想获取其他日期信息,可以使用某年的1月1日(不是闰年)作为起始点,并将其转换为datetime
。然后,您可以获取许多日期属性,例如小时。
df['hour'] = pd.to_datetime(df['minute'], unit='m', origin='2017-01-01').dt.hour
然后,要获取平均值,您可以使用以下代码获得结果为24行的Series:
df.groupby('hour')['value'].mean()
英文:
Using integer and remainder division you can get the hour.
df['hour'] = df['minute']//60%24
If you want other date information it can be useful to use January 1st of some year (not a leap year) as the origin and convert to a datetime
. Then you can grab a lot of the date attributes, in this case hour.
df['hour'] = pd.to_datetime(df['minute'], unit='m', origin='2017-01-01').dt.hour
Then for your averages you get the resulting 24 row Series with:
df.groupby('hour')['value'].mean()
答案2
得分: 1
以下是翻译好的代码部分:
这是一种方法:
# 示例数据框
df = pd.DataFrame({'minute': np.arange(525600), 'value': np.arange(525600)})
# 设置时间格式
df['minute'] = pd.to_timedelta(df['minute'], unit='m')
# 计算平均值
df_new = df.groupby(pd.Grouper(key='minute', freq='1H'))['value'].mean().reset_index()
虽然你不需要显式添加“hour”列来计算这些值,但如果你想获取它,可以这样做:
df_new['hour'] = pd.to_datetime(df_new['minute']).dt.hour
英文:
Here's a way to do:
# sample df
df = pd.DataFrame({'minute': np.arange(525600), 'value': np.arange(525600)})
# set time format
df['minute'] = pd.to_timedelta(df['minute'], unit='m')
# calculate mean
df_new = df.groupby(pd.Grouper(key='minute', freq='1H'))['value'].mean().reset_index()
Although, you don't need hour
column explicity to calculate these value, but if you want to get it, you can do it by:
df_new['hour'] = pd.to_datetime(df_new['minute']).dt.hour
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论