2023年2月6日 13:37:25go评论172阅读模式

英文:

How to calculate average level on certain days and times in a pandas data frame

问题

我有一个数据框，如下所示

日期时间	水平
2017-08-08 23:55:01	239.0
2017-08-08 23:50:01	242.0
2017-08-08 23:45:01	246.0
2017-08-08 23:40:01	250.0
2017-08-08 23:35:01	254.0
...	...
2017-07-26 00:23:57	72.0
2017-07-26 00:18:57	67.0
2017-07-26 00:13:57	64.0
2017-07-26 00:08:57	64.0
2017-07-26 00:03:57	65.0

我想要计算每天在清醒时间和夜间时间的平均水平

日期	时间	平均水平
2017-08-08	00:00:00 - 06:00:00	178
2017-08-08	06:00:01 - 23:59:99	190
2017-09-08	00:00:00 - 06:00:00	174
2017-09-08	06:00:01 - 23:59:99	200

我已经尝试将其拆分为单独的表格并使用for循环，但那会使用太多内存并且耗时太长。

英文:

I have a data frame like so

Date_Time	Level
2017-08-08 23:55:01	239.0
2017-08-08 23:50:01	242.0
2017-08-08 23:45:01	246.0
2017-08-08 23:40:01	250.0
2017-08-08 23:35:01	254.0
...	...
2017-07-26 00:23:57	72.0
2017-07-26 00:18:57	67.0
2017-07-26 00:13:57	64.0
2017-07-26 00:08:57	64.0
2017-07-26 00:03:57	65.0

I want to calculate the average level on every day, during the waking hours and overnight hours

Date	Time	AvgLevel
2017-08-08	00:00:00 - 06:00:00	178
2017-08-08	06:00:01 - 23:59:99	190
2017-09-08	00:00:00 - 06:00:00	174
2017-09-08	06:00:01 - 23:59:99	200

I've already tried splitting into separate tables and using for loops however that uses too much memory and takes too much time

答案1

得分: 1

可以使用 np.where 来区分白天和夜晚的时间段。

创建样本数据

data = {
    'Date_Time': [
        '2017-08-08 00:00:00', '2017-08-08 23:50:01', '2017-08-08 06:45:01',
        '2017-08-08 06:00:00', '2017-08-08 00:35:01',
        '2017-07-26 00:23:57', '2017-07-26 00:18:57', '2017-07-26 07:13:57',
        '2017-07-26 00:08:57', '2017-07-26 07:03:57'
    ],
    'Level': [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}

df = pd.DataFrame(data, columns=['Date_Time', 'Level'])
df['Date_Time'] = pd.to_datetime(df['Date_Time'])
df = df.set_index('Date_Time')
print(df)

创建白天和夜晚时间段的掩码

mask = (df.index.time >= pd.to_datetime('00:00:00').time()) & (df.index.time <= pd.to_datetime('06:00:00').time())
df['Period'] = np.where(mask, '00:00:00 - 06:00:00', '06:00:01 - 23:59:59')
df

按 Date_Time 和 Period 列分组，并计算平均 Level

result = df.groupby([df.index.date, 'Period'])['Level'].mean().reset_index()
result.columns = ['Date', 'Time', 'AvgLevel']
result

英文:

You can use np.where to differentiate between waking hours and overnight hours

Creating sample data

data = {
    &#39;Date_Time&#39;: [
        &#39;2017-08-08 00:00:00&#39;, &#39;2017-08-08 23:50:01&#39;, &#39;2017-08-08 06:45:01&#39;,
        &#39;2017-08-08 06:00:00&#39;, &#39;2017-08-08 00:35:01&#39;,
        &#39;2017-07-26 00:23:57&#39;, &#39;2017-07-26 00:18:57&#39;, &#39;2017-07-26 07:13:57&#39;,
        &#39;2017-07-26 00:08:57&#39;, &#39;2017-07-26 07:03:57&#39;
    ],
    &#39;Level&#39;: [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}

df = pd.DataFrame(data, columns=[&#39;Date_Time&#39;, &#39;Level&#39;])
df[&#39;Date_Time&#39;] = pd.to_datetime(df[&#39;Date_Time&#39;])
df = df.set_index(&#39;Date_Time&#39;)
print(df)

                     Level
Date_Time                 
2017-08-08 00:00:00  239.0
2017-08-08 23:50:01  242.0
2017-08-08 06:45:01  246.0
2017-08-08 06:00:00  250.0
2017-08-08 00:35:01  254.0
2017-07-26 00:23:57   72.0
2017-07-26 00:18:57   67.0
2017-07-26 07:13:57   64.0
2017-07-26 00:08:57   64.0
2017-07-26 07:03:57   65.0

Creating a mask of waking hours and overnight hours

mask = (df.index.time &gt;= pd.to_datetime(&#39;00:00:00&#39;).time()) &amp; (df.index.time &lt;= pd.to_datetime(&#39;06:00:00&#39;).time())
df[&#39;Period&#39;] = np.where(mask, &#39;00:00:00 - 06:00:00&#39;, &#39;06:00:01 - 23:59:59&#39;)
df

                     Level               Period
Date_Time                                      
2017-08-08 00:00:00  239.0  00:00:00 - 06:00:00
2017-08-08 23:50:01  242.0  06:00:01 - 23:59:59
2017-08-08 06:45:01  246.0  06:00:01 - 23:59:59
2017-08-08 06:00:00  250.0  00:00:00 - 06:00:00
2017-08-08 00:35:01  254.0  00:00:00 - 06:00:00
2017-07-26 00:23:57   72.0  00:00:00 - 06:00:00
2017-07-26 00:18:57   67.0  00:00:00 - 06:00:00
2017-07-26 07:13:57   64.0  06:00:01 - 23:59:59
2017-07-26 00:08:57   64.0  00:00:00 - 06:00:00
2017-07-26 07:03:57   65.0  06:00:01 - 23:59:59

Groupby the Date_Time and Period column and calculate average Level

result = df.groupby([df.index.date, &#39;Period&#39;])[&#39;Level&#39;].mean().reset_index()
result.columns = [&#39;Date&#39;, &#39;Time&#39;, &#39;AvgLevel&#39;]
result

         Date                 Time    AvgLevel
0  2017-07-26  00:00:00 - 06:00:00   67.666667
1  2017-07-26  06:00:01 - 23:59:59   64.500000
2  2017-08-08  00:00:00 - 06:00:00  247.666667
3  2017-08-08  06:00:01 - 23:59:59  244.000000

答案2

得分: 1

以下是已翻译的代码部分：

import pandas as pd

df = pd.read_csv("data.csv", sep=";")
print(df)
df["Date_Time"] = pd.to_datetime(df["Date_Time"])

df["Date"] = df["Date_Time"].dt.date
df["Time"] = df["Date_Time"].dt.time

df["Time_Period"] = "Overnight"
df.loc[(df["Time"] >= pd.to_datetime("06:00:00").time()) & (df["Time"] <= pd.to_datetime("23:59:59").time()), "Time_Period"] = "Waking"

grouped = df.groupby(["Date", "Time_Period"])["Level"].mean().reset_index()

grouped = grouped.rename(columns={"Date": "Date", "Time_Period": "Time", "Level": "AvgLevel"})

grouped["Time"] = grouped["Time"].map({
    "Waking": "06:00:01 - 23:59:99",
    "Overnight": "00:00:00 - 06:00:00"
})

print(grouped)

希望对你有所帮助。如果你需要更多的帮助，请随时告诉我。

英文:

You can do the following:

import pandas as pd

df = pd.read_csv(&quot;data.csv&quot;, sep=&quot;;&quot;)
print(df)
df[&quot;Date_Time&quot;] = pd.to_datetime(df[&quot;Date_Time&quot;])

df[&quot;Date&quot;] = df[&quot;Date_Time&quot;].dt.date
df[&quot;Time&quot;] = df[&quot;Date_Time&quot;].dt.time

df[&quot;Time_Period&quot;] = &quot;Overnight&quot;
df.loc[(df[&quot;Time&quot;] &gt;= pd.to_datetime(&quot;06:00:00&quot;).time()) &amp; (df[&quot;Time&quot;] &lt;= pd.to_datetime(&quot;23:59:59&quot;).time()), &quot;Time_Period&quot;] = &quot;Waking&quot;

grouped = df.groupby([&quot;Date&quot;, &quot;Time_Period&quot;])[&quot;Level&quot;].mean().reset_index()

grouped = grouped.rename(columns={&quot;Date&quot;: &quot;Date&quot;, &quot;Time_Period&quot;: &quot;Time&quot;, &quot;Level&quot;: &quot;AvgLevel&quot;})

grouped[&quot;Time&quot;] = grouped[&quot;Time&quot;].map({
    &quot;Waking&quot;: &quot;06:00:01 - 23:59:99&quot;,
    &quot;Overnight&quot;: &quot;00:00:00 - 06:00:00&quot;
})

print(grouped)

Basically, you group entries by times that are nightly and daily:

This results in (I assume here that your expected outcome you print is for the entire dataframe):

         Date                 Time  AvgLevel
0  2017-07-26  00:00:00 - 06:00:00      66.4
1  2017-08-08  06:00:01 - 23:59:99     246.2

答案3

得分: 0

使用Pandas的freq选项，可以计算均值、总和等在相等时间段内的数值，即freq='H'用于小时计算，freq='12H'用于12小时计算，freq='D'用于每日计算，freq='BH'用于工作小时计算。
示例如下：

avg_12_hours = df.groupby(pd.Grouper(freq='12H', key='Date_Time'))['Level'].mean()

由于您要进行不均等分割的计算期间，因此需要进行一些自定义计算。

英文:

by using Pandas freq option, mean, sum etc. can be calculated for an equal portion of time i.e. freq='H' for hourly calculation, freq='12H' for 12 hourly calculation, freq='D' for daily calculation and freq='BH' for business hoursly calculations.
Example is below:

avg_12_hours = df.groupby(pd.Grouper(freq=&#39;12H&#39;, key=&#39;Date_Time&#39;))[&#39;Level&#39;].mean()

Since, you are asking for a calculation period which is not equally splitted so, you need to do some custom calculations

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas数据框中计算特定日期和时间的平均水平

问题

答案1

答案2

答案3

如何清理这个目录遍历？

读取CSV文件并插入数据库的性能

我的卷积神经网络无法正确预测不在数据集中的图像。

如何使用Confluent Kafka Python包消费Kafka中的最后5分钟数据？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论