2020年1月3日 23:10:30go评论112阅读模式

英文:

Add hours column to regular list of minutes, group by it, and average the data in Python

问题

我已查找类似的问题，但似乎没有解决以下挑战的答案。我有一个带有分钟和相应值的pandas数据帧，如下所示：

分钟值
0 454
1 434
2 254

这个列表是一年的列表，因此有60分钟* 24小时* 365天=525600个观察值。

我想要添加一个名为"hour"的新列，用于表示一天中的小时（假设分钟0-59是上午12点，60-119是上午1点，依此类推，直到第二天，序列重新开始）。

然后，一旦添加了"hour"列，我想要按小时对观察值进行分组，并计算一年中每个小时的平均值，最终得到一个数据帧，其中包含24个观察值，每个观察值表示原始数据在每个小时n处的平均值。

英文:

I have looked for similar questions, but none seems to be addressing the following challenge. I have a pandas dataframe with a list of minutes and corresponding values, like the following:

minute value
0        454
1        434
2        254

The list is a year-long list, thus counting 60 minutes * 24 hours * 365 days = 525600 observations.

I would like to add a new column called hour, which indeed expresses the hour of the day (assuming minutes 0-59 are 12AM, 60-119 are 1AM, and so forth until the following day, where the sequence restarts).

Then, once the hour column is added, I would like to group observations by it and calculate the average value for every hour of the year, and end up with a dataframe with 24 observations, each expressing the average value of the original data at each hour n.

答案1

得分: 1

使用整数和余数除法，您可以获取小时。

df['hour'] = df['minute'] // 60 % 24

如果您想获取其他日期信息，可以使用某年的1月1日（不是闰年）作为起始点，并将其转换为datetime。然后，您可以获取许多日期属性，例如小时。

df['hour'] = pd.to_datetime(df['minute'], unit='m', origin='2017-01-01').dt.hour

然后，要获取平均值，您可以使用以下代码获得结果为24行的Series：

df.groupby('hour')['value'].mean()

英文:

Using integer and remainder division you can get the hour.

df[&#39;hour&#39;] = df[&#39;minute&#39;]//60%24

If you want other date information it can be useful to use January 1st of some year (not a leap year) as the origin and convert to a datetime. Then you can grab a lot of the date attributes, in this case hour.

df[&#39;hour&#39;] = pd.to_datetime(df[&#39;minute&#39;], unit=&#39;m&#39;, origin=&#39;2017-01-01&#39;).dt.hour

Then for your averages you get the resulting 24 row Series with:

df.groupby(&#39;hour&#39;)[&#39;value&#39;].mean()

答案2

得分: 1

以下是翻译好的代码部分：

这是一种方法：
    # 示例数据框
    df = pd.DataFrame({'minute': np.arange(525600), 'value': np.arange(525600)})
    
    # 设置时间格式
    df['minute'] = pd.to_timedelta(df['minute'], unit='m')
    
    # 计算平均值
    df_new = df.groupby(pd.Grouper(key='minute', freq='1H'))['value'].mean().reset_index()
虽然你不需要显式添加“hour”列来计算这些值，但如果你想获取它，可以这样做：
    df_new['hour'] = pd.to_datetime(df_new['minute']).dt.hour

英文:

Here's a way to do:

# sample df
df = pd.DataFrame({&#39;minute&#39;: np.arange(525600), &#39;value&#39;: np.arange(525600)})
# set time format
df[&#39;minute&#39;] = pd.to_timedelta(df[&#39;minute&#39;], unit=&#39;m&#39;)
# calculate mean
df_new = df.groupby(pd.Grouper(key=&#39;minute&#39;, freq=&#39;1H&#39;))[&#39;value&#39;].mean().reset_index()

Although, you don't need hour column explicity to calculate these value, but if you want to get it, you can do it by:

df_new[&#39;hour&#39;] = pd.to_datetime(df_new[&#39;minute&#39;]).dt.hour

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Add hours column to regular list of minutes, group by it, and average the data in Python

问题

答案1

答案2

创建链表从列表

模拟 Python 中的 SSLError

使用Python的requests库下载超过1GB的大型数据并将其保存到文件中。

AttributeError在使用python-telegram-bot时出现的aexit错误。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。