Add hours column to regular list of minutes, group by it, and average the data in Python

huangapple go评论89阅读模式
英文:

Add hours column to regular list of minutes, group by it, and average the data in Python

问题

我已查找类似的问题,但似乎没有解决以下挑战的答案。我有一个带有分钟和相应值的pandas数据帧,如下所示:

分钟 值
0 454
1 434
2 254

这个列表是一年的列表,因此有60分钟* 24小时* 365天=525600个观察值。

我想要添加一个名为"hour"的新列,用于表示一天中的小时(假设分钟0-59是上午12点,60-119是上午1点,依此类推,直到第二天,序列重新开始)。

然后,一旦添加了"hour"列,我想要按小时对观察值进行分组,并计算一年中每个小时的平均值,最终得到一个数据帧,其中包含24个观察值,每个观察值表示原始数据在每个小时n处的平均值。

英文:

I have looked for similar questions, but none seems to be addressing the following challenge. I have a pandas dataframe with a list of minutes and corresponding values, like the following:

minute value
0        454
1        434
2        254

The list is a year-long list, thus counting 60 minutes * 24 hours * 365 days = 525600 observations.

I would like to add a new column called hour, which indeed expresses the hour of the day (assuming minutes 0-59 are 12AM, 60-119 are 1AM, and so forth until the following day, where the sequence restarts).

Then, once the hour column is added, I would like to group observations by it and calculate the average value for every hour of the year, and end up with a dataframe with 24 observations, each expressing the average value of the original data at each hour n.

答案1

得分: 1

使用整数和余数除法,您可以获取小时。

df['hour'] = df['minute'] // 60 % 24

如果您想获取其他日期信息,可以使用某年的1月1日(不是闰年)作为起始点,并将其转换为datetime。然后,您可以获取许多日期属性,例如小时。

df['hour'] = pd.to_datetime(df['minute'], unit='m', origin='2017-01-01').dt.hour

然后,要获取平均值,您可以使用以下代码获得结果为24行的Series:

df.groupby('hour')['value'].mean()
英文:

Using integer and remainder division you can get the hour.

df['hour'] = df['minute']//60%24

If you want other date information it can be useful to use January 1st of some year (not a leap year) as the origin and convert to a datetime. Then you can grab a lot of the date attributes, in this case hour.

df['hour'] = pd.to_datetime(df['minute'], unit='m', origin='2017-01-01').dt.hour

Then for your averages you get the resulting 24 row Series with:

df.groupby('hour')['value'].mean()

答案2

得分: 1

以下是翻译好的代码部分:

这是一种方法

    # 示例数据框
    df = pd.DataFrame({'minute': np.arange(525600), 'value': np.arange(525600)})
    
    # 设置时间格式
    df['minute'] = pd.to_timedelta(df['minute'], unit='m')
    
    # 计算平均值
    df_new = df.groupby(pd.Grouper(key='minute', freq='1H'))['value'].mean().reset_index()

虽然你不需要显式添加hour列来计算这些值但如果你想获取它可以这样做

    df_new['hour'] = pd.to_datetime(df_new['minute']).dt.hour
英文:

Here's a way to do:

# sample df
df = pd.DataFrame({'minute': np.arange(525600), 'value': np.arange(525600)})

# set time format
df['minute'] = pd.to_timedelta(df['minute'], unit='m')

# calculate mean
df_new = df.groupby(pd.Grouper(key='minute', freq='1H'))['value'].mean().reset_index()

Although, you don't need hour column explicity to calculate these value, but if you want to get it, you can do it by:

df_new['hour'] = pd.to_datetime(df_new['minute']).dt.hour

huangapple
  • 本文由 发表于 2020年1月3日 23:10:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/59580941.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定