如何在pandas数据框中计算特定日期和时间的平均水平

huangapple go评论64阅读模式
英文:

How to calculate average level on certain days and times in a pandas data frame

问题

我有一个数据框,如下所示

日期时间 水平
2017-08-08 23:55:01 239.0
2017-08-08 23:50:01 242.0
2017-08-08 23:45:01 246.0
2017-08-08 23:40:01 250.0
2017-08-08 23:35:01 254.0
... ...
2017-07-26 00:23:57 72.0
2017-07-26 00:18:57 67.0
2017-07-26 00:13:57 64.0
2017-07-26 00:08:57 64.0
2017-07-26 00:03:57 65.0

我想要计算每天在清醒时间和夜间时间的平均水平

日期 时间 平均水平
2017-08-08 00:00:00 - 06:00:00 178
2017-08-08 06:00:01 - 23:59:99 190
2017-09-08 00:00:00 - 06:00:00 174
2017-09-08 06:00:01 - 23:59:99 200

我已经尝试将其拆分为单独的表格并使用for循环,但那会使用太多内存并且耗时太长。

英文:

I have a data frame like so

Date_Time Level
2017-08-08 23:55:01 239.0
2017-08-08 23:50:01 242.0
2017-08-08 23:45:01 246.0
2017-08-08 23:40:01 250.0
2017-08-08 23:35:01 254.0
... ...
2017-07-26 00:23:57 72.0
2017-07-26 00:18:57 67.0
2017-07-26 00:13:57 64.0
2017-07-26 00:08:57 64.0
2017-07-26 00:03:57 65.0

I want to calculate the average level on every day, during the waking hours and overnight hours

Date Time AvgLevel
2017-08-08 00:00:00 - 06:00:00 178
2017-08-08 06:00:01 - 23:59:99 190
2017-09-08 00:00:00 - 06:00:00 174
2017-09-08 06:00:01 - 23:59:99 200

I've already tried splitting into separate tables and using for loops however that uses too much memory and takes too much time

答案1

得分: 1

可以使用 np.where 来区分白天和夜晚的时间段。

创建样本数据

data = {
    'Date_Time': [
        '2017-08-08 00:00:00', '2017-08-08 23:50:01', '2017-08-08 06:45:01',
        '2017-08-08 06:00:00', '2017-08-08 00:35:01',
        '2017-07-26 00:23:57', '2017-07-26 00:18:57', '2017-07-26 07:13:57',
        '2017-07-26 00:08:57', '2017-07-26 07:03:57'
    ],
    'Level': [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}

df = pd.DataFrame(data, columns=['Date_Time', 'Level'])
df['Date_Time'] = pd.to_datetime(df['Date_Time'])
df = df.set_index('Date_Time')
print(df)

创建白天和夜晚时间段的掩码

mask = (df.index.time >= pd.to_datetime('00:00:00').time()) & (df.index.time <= pd.to_datetime('06:00:00').time())
df['Period'] = np.where(mask, '00:00:00 - 06:00:00', '06:00:01 - 23:59:59')
df

Date_TimePeriod 列分组,并计算平均 Level

result = df.groupby([df.index.date, 'Period'])['Level'].mean().reset_index()
result.columns = ['Date', 'Time', 'AvgLevel']
result
英文:

You can use np.where to differentiate between waking hours and overnight hours

Creating sample data

data = {
    &#39;Date_Time&#39;: [
        &#39;2017-08-08 00:00:00&#39;, &#39;2017-08-08 23:50:01&#39;, &#39;2017-08-08 06:45:01&#39;,
        &#39;2017-08-08 06:00:00&#39;, &#39;2017-08-08 00:35:01&#39;,
        &#39;2017-07-26 00:23:57&#39;, &#39;2017-07-26 00:18:57&#39;, &#39;2017-07-26 07:13:57&#39;,
        &#39;2017-07-26 00:08:57&#39;, &#39;2017-07-26 07:03:57&#39;
    ],
    &#39;Level&#39;: [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}

df = pd.DataFrame(data, columns=[&#39;Date_Time&#39;, &#39;Level&#39;])
df[&#39;Date_Time&#39;] = pd.to_datetime(df[&#39;Date_Time&#39;])
df = df.set_index(&#39;Date_Time&#39;)
print(df)

                     Level
Date_Time                 
2017-08-08 00:00:00  239.0
2017-08-08 23:50:01  242.0
2017-08-08 06:45:01  246.0
2017-08-08 06:00:00  250.0
2017-08-08 00:35:01  254.0
2017-07-26 00:23:57   72.0
2017-07-26 00:18:57   67.0
2017-07-26 07:13:57   64.0
2017-07-26 00:08:57   64.0
2017-07-26 07:03:57   65.0

Creating a mask of waking hours and overnight hours

mask = (df.index.time &gt;= pd.to_datetime(&#39;00:00:00&#39;).time()) &amp; (df.index.time &lt;= pd.to_datetime(&#39;06:00:00&#39;).time())
df[&#39;Period&#39;] = np.where(mask, &#39;00:00:00 - 06:00:00&#39;, &#39;06:00:01 - 23:59:59&#39;)
df

                     Level               Period
Date_Time                                      
2017-08-08 00:00:00  239.0  00:00:00 - 06:00:00
2017-08-08 23:50:01  242.0  06:00:01 - 23:59:59
2017-08-08 06:45:01  246.0  06:00:01 - 23:59:59
2017-08-08 06:00:00  250.0  00:00:00 - 06:00:00
2017-08-08 00:35:01  254.0  00:00:00 - 06:00:00
2017-07-26 00:23:57   72.0  00:00:00 - 06:00:00
2017-07-26 00:18:57   67.0  00:00:00 - 06:00:00
2017-07-26 07:13:57   64.0  06:00:01 - 23:59:59
2017-07-26 00:08:57   64.0  00:00:00 - 06:00:00
2017-07-26 07:03:57   65.0  06:00:01 - 23:59:59

Groupby the Date_Time and Period column and calculate average Level

result = df.groupby([df.index.date, &#39;Period&#39;])[&#39;Level&#39;].mean().reset_index()
result.columns = [&#39;Date&#39;, &#39;Time&#39;, &#39;AvgLevel&#39;]
result

         Date                 Time    AvgLevel
0  2017-07-26  00:00:00 - 06:00:00   67.666667
1  2017-07-26  06:00:01 - 23:59:59   64.500000
2  2017-08-08  00:00:00 - 06:00:00  247.666667
3  2017-08-08  06:00:01 - 23:59:59  244.000000

答案2

得分: 1

以下是已翻译的代码部分:

import pandas as pd

df = pd.read_csv("data.csv", sep=";")
print(df)
df["Date_Time"] = pd.to_datetime(df["Date_Time"])

df["Date"] = df["Date_Time"].dt.date
df["Time"] = df["Date_Time"].dt.time

df["Time_Period"] = "Overnight"
df.loc[(df["Time"] >= pd.to_datetime("06:00:00").time()) & (df["Time"] <= pd.to_datetime("23:59:59").time()), "Time_Period"] = "Waking"

grouped = df.groupby(["Date", "Time_Period"])["Level"].mean().reset_index()

grouped = grouped.rename(columns={"Date": "Date", "Time_Period": "Time", "Level": "AvgLevel"})

grouped["Time"] = grouped["Time"].map({
    "Waking": "06:00:01 - 23:59:99",
    "Overnight": "00:00:00 - 06:00:00"
})

print(grouped)

希望对你有所帮助。如果你需要更多的帮助,请随时告诉我。

英文:

You can do the following:

import pandas as pd

df = pd.read_csv(&quot;data.csv&quot;, sep=&quot;;&quot;)
print(df)
df[&quot;Date_Time&quot;] = pd.to_datetime(df[&quot;Date_Time&quot;])

df[&quot;Date&quot;] = df[&quot;Date_Time&quot;].dt.date
df[&quot;Time&quot;] = df[&quot;Date_Time&quot;].dt.time

df[&quot;Time_Period&quot;] = &quot;Overnight&quot;
df.loc[(df[&quot;Time&quot;] &gt;= pd.to_datetime(&quot;06:00:00&quot;).time()) &amp; (df[&quot;Time&quot;] &lt;= pd.to_datetime(&quot;23:59:59&quot;).time()), &quot;Time_Period&quot;] = &quot;Waking&quot;

grouped = df.groupby([&quot;Date&quot;, &quot;Time_Period&quot;])[&quot;Level&quot;].mean().reset_index()

grouped = grouped.rename(columns={&quot;Date&quot;: &quot;Date&quot;, &quot;Time_Period&quot;: &quot;Time&quot;, &quot;Level&quot;: &quot;AvgLevel&quot;})

grouped[&quot;Time&quot;] = grouped[&quot;Time&quot;].map({
    &quot;Waking&quot;: &quot;06:00:01 - 23:59:99&quot;,
    &quot;Overnight&quot;: &quot;00:00:00 - 06:00:00&quot;
})

print(grouped)

Basically, you group entries by times that are nightly and daily:

This results in (I assume here that your expected outcome you print is for the entire dataframe):

         Date                 Time  AvgLevel
0  2017-07-26  00:00:00 - 06:00:00      66.4
1  2017-08-08  06:00:01 - 23:59:99     246.2

答案3

得分: 0

使用Pandas的freq选项,可以计算均值、总和等在相等时间段内的数值,即freq='H'用于小时计算,freq='12H'用于12小时计算,freq='D'用于每日计算,freq='BH'用于工作小时计算。
示例如下:

avg_12_hours = df.groupby(pd.Grouper(freq='12H', key='Date_Time'))['Level'].mean()

由于您要进行不均等分割的计算期间,因此需要进行一些自定义计算。

英文:

by using Pandas freq option, mean, sum etc. can be calculated for an equal portion of time i.e. freq=&#39;H&#39; for hourly calculation, freq=&#39;12H&#39; for 12 hourly calculation, freq=&#39;D&#39; for daily calculation and freq=&#39;BH&#39; for business hoursly calculations.
Example is below:

avg_12_hours = df.groupby(pd.Grouper(freq=&#39;12H&#39;, key=&#39;Date_Time&#39;))[&#39;Level&#39;].mean()

Since, you are asking for a calculation period which is not equally splitted so, you need to do some custom calculations

huangapple
  • 本文由 发表于 2023年2月6日 13:37:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75357670.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定