如何在pandas数据框中计算特定日期和时间的平均水平

huangapple go评论97阅读模式
英文:

How to calculate average level on certain days and times in a pandas data frame

问题

我有一个数据框,如下所示

日期时间 水平
2017-08-08 23:55:01 239.0
2017-08-08 23:50:01 242.0
2017-08-08 23:45:01 246.0
2017-08-08 23:40:01 250.0
2017-08-08 23:35:01 254.0
... ...
2017-07-26 00:23:57 72.0
2017-07-26 00:18:57 67.0
2017-07-26 00:13:57 64.0
2017-07-26 00:08:57 64.0
2017-07-26 00:03:57 65.0

我想要计算每天在清醒时间和夜间时间的平均水平

日期 时间 平均水平
2017-08-08 00:00:00 - 06:00:00 178
2017-08-08 06:00:01 - 23:59:99 190
2017-09-08 00:00:00 - 06:00:00 174
2017-09-08 06:00:01 - 23:59:99 200

我已经尝试将其拆分为单独的表格并使用for循环,但那会使用太多内存并且耗时太长。

英文:

I have a data frame like so

Date_Time Level
2017-08-08 23:55:01 239.0
2017-08-08 23:50:01 242.0
2017-08-08 23:45:01 246.0
2017-08-08 23:40:01 250.0
2017-08-08 23:35:01 254.0
... ...
2017-07-26 00:23:57 72.0
2017-07-26 00:18:57 67.0
2017-07-26 00:13:57 64.0
2017-07-26 00:08:57 64.0
2017-07-26 00:03:57 65.0

I want to calculate the average level on every day, during the waking hours and overnight hours

Date Time AvgLevel
2017-08-08 00:00:00 - 06:00:00 178
2017-08-08 06:00:01 - 23:59:99 190
2017-09-08 00:00:00 - 06:00:00 174
2017-09-08 06:00:01 - 23:59:99 200

I've already tried splitting into separate tables and using for loops however that uses too much memory and takes too much time

答案1

得分: 1

可以使用 np.where 来区分白天和夜晚的时间段。

创建样本数据

  1. data = {
  2. 'Date_Time': [
  3. '2017-08-08 00:00:00', '2017-08-08 23:50:01', '2017-08-08 06:45:01',
  4. '2017-08-08 06:00:00', '2017-08-08 00:35:01',
  5. '2017-07-26 00:23:57', '2017-07-26 00:18:57', '2017-07-26 07:13:57',
  6. '2017-07-26 00:08:57', '2017-07-26 07:03:57'
  7. ],
  8. 'Level': [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
  9. }
  10. df = pd.DataFrame(data, columns=['Date_Time', 'Level'])
  11. df['Date_Time'] = pd.to_datetime(df['Date_Time'])
  12. df = df.set_index('Date_Time')
  13. print(df)

创建白天和夜晚时间段的掩码

  1. mask = (df.index.time >= pd.to_datetime('00:00:00').time()) & (df.index.time <= pd.to_datetime('06:00:00').time())
  2. df['Period'] = np.where(mask, '00:00:00 - 06:00:00', '06:00:01 - 23:59:59')
  3. df

Date_TimePeriod 列分组,并计算平均 Level

  1. result = df.groupby([df.index.date, 'Period'])['Level'].mean().reset_index()
  2. result.columns = ['Date', 'Time', 'AvgLevel']
  3. result
英文:

You can use np.where to differentiate between waking hours and overnight hours

Creating sample data

  1. data = {
  2. &#39;Date_Time&#39;: [
  3. &#39;2017-08-08 00:00:00&#39;, &#39;2017-08-08 23:50:01&#39;, &#39;2017-08-08 06:45:01&#39;,
  4. &#39;2017-08-08 06:00:00&#39;, &#39;2017-08-08 00:35:01&#39;,
  5. &#39;2017-07-26 00:23:57&#39;, &#39;2017-07-26 00:18:57&#39;, &#39;2017-07-26 07:13:57&#39;,
  6. &#39;2017-07-26 00:08:57&#39;, &#39;2017-07-26 07:03:57&#39;
  7. ],
  8. &#39;Level&#39;: [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
  9. }
  10. df = pd.DataFrame(data, columns=[&#39;Date_Time&#39;, &#39;Level&#39;])
  11. df[&#39;Date_Time&#39;] = pd.to_datetime(df[&#39;Date_Time&#39;])
  12. df = df.set_index(&#39;Date_Time&#39;)
  13. print(df)
  14. Level
  15. Date_Time
  16. 2017-08-08 00:00:00 239.0
  17. 2017-08-08 23:50:01 242.0
  18. 2017-08-08 06:45:01 246.0
  19. 2017-08-08 06:00:00 250.0
  20. 2017-08-08 00:35:01 254.0
  21. 2017-07-26 00:23:57 72.0
  22. 2017-07-26 00:18:57 67.0
  23. 2017-07-26 07:13:57 64.0
  24. 2017-07-26 00:08:57 64.0
  25. 2017-07-26 07:03:57 65.0

Creating a mask of waking hours and overnight hours

  1. mask = (df.index.time &gt;= pd.to_datetime(&#39;00:00:00&#39;).time()) &amp; (df.index.time &lt;= pd.to_datetime(&#39;06:00:00&#39;).time())
  2. df[&#39;Period&#39;] = np.where(mask, &#39;00:00:00 - 06:00:00&#39;, &#39;06:00:01 - 23:59:59&#39;)
  3. df
  4. Level Period
  5. Date_Time
  6. 2017-08-08 00:00:00 239.0 00:00:00 - 06:00:00
  7. 2017-08-08 23:50:01 242.0 06:00:01 - 23:59:59
  8. 2017-08-08 06:45:01 246.0 06:00:01 - 23:59:59
  9. 2017-08-08 06:00:00 250.0 00:00:00 - 06:00:00
  10. 2017-08-08 00:35:01 254.0 00:00:00 - 06:00:00
  11. 2017-07-26 00:23:57 72.0 00:00:00 - 06:00:00
  12. 2017-07-26 00:18:57 67.0 00:00:00 - 06:00:00
  13. 2017-07-26 07:13:57 64.0 06:00:01 - 23:59:59
  14. 2017-07-26 00:08:57 64.0 00:00:00 - 06:00:00
  15. 2017-07-26 07:03:57 65.0 06:00:01 - 23:59:59

Groupby the Date_Time and Period column and calculate average Level

  1. result = df.groupby([df.index.date, &#39;Period&#39;])[&#39;Level&#39;].mean().reset_index()
  2. result.columns = [&#39;Date&#39;, &#39;Time&#39;, &#39;AvgLevel&#39;]
  3. result
  4. Date Time AvgLevel
  5. 0 2017-07-26 00:00:00 - 06:00:00 67.666667
  6. 1 2017-07-26 06:00:01 - 23:59:59 64.500000
  7. 2 2017-08-08 00:00:00 - 06:00:00 247.666667
  8. 3 2017-08-08 06:00:01 - 23:59:59 244.000000

答案2

得分: 1

以下是已翻译的代码部分:

  1. import pandas as pd
  2. df = pd.read_csv("data.csv", sep=";")
  3. print(df)
  4. df["Date_Time"] = pd.to_datetime(df["Date_Time"])
  5. df["Date"] = df["Date_Time"].dt.date
  6. df["Time"] = df["Date_Time"].dt.time
  7. df["Time_Period"] = "Overnight"
  8. df.loc[(df["Time"] >= pd.to_datetime("06:00:00").time()) & (df["Time"] <= pd.to_datetime("23:59:59").time()), "Time_Period"] = "Waking"
  9. grouped = df.groupby(["Date", "Time_Period"])["Level"].mean().reset_index()
  10. grouped = grouped.rename(columns={"Date": "Date", "Time_Period": "Time", "Level": "AvgLevel"})
  11. grouped["Time"] = grouped["Time"].map({
  12. "Waking": "06:00:01 - 23:59:99",
  13. "Overnight": "00:00:00 - 06:00:00"
  14. })
  15. print(grouped)

希望对你有所帮助。如果你需要更多的帮助,请随时告诉我。

英文:

You can do the following:

  1. import pandas as pd
  2. df = pd.read_csv(&quot;data.csv&quot;, sep=&quot;;&quot;)
  3. print(df)
  4. df[&quot;Date_Time&quot;] = pd.to_datetime(df[&quot;Date_Time&quot;])
  5. df[&quot;Date&quot;] = df[&quot;Date_Time&quot;].dt.date
  6. df[&quot;Time&quot;] = df[&quot;Date_Time&quot;].dt.time
  7. df[&quot;Time_Period&quot;] = &quot;Overnight&quot;
  8. df.loc[(df[&quot;Time&quot;] &gt;= pd.to_datetime(&quot;06:00:00&quot;).time()) &amp; (df[&quot;Time&quot;] &lt;= pd.to_datetime(&quot;23:59:59&quot;).time()), &quot;Time_Period&quot;] = &quot;Waking&quot;
  9. grouped = df.groupby([&quot;Date&quot;, &quot;Time_Period&quot;])[&quot;Level&quot;].mean().reset_index()
  10. grouped = grouped.rename(columns={&quot;Date&quot;: &quot;Date&quot;, &quot;Time_Period&quot;: &quot;Time&quot;, &quot;Level&quot;: &quot;AvgLevel&quot;})
  11. grouped[&quot;Time&quot;] = grouped[&quot;Time&quot;].map({
  12. &quot;Waking&quot;: &quot;06:00:01 - 23:59:99&quot;,
  13. &quot;Overnight&quot;: &quot;00:00:00 - 06:00:00&quot;
  14. })
  15. print(grouped)

Basically, you group entries by times that are nightly and daily:

This results in (I assume here that your expected outcome you print is for the entire dataframe):

  1. Date Time AvgLevel
  2. 0 2017-07-26 00:00:00 - 06:00:00 66.4
  3. 1 2017-08-08 06:00:01 - 23:59:99 246.2

答案3

得分: 0

使用Pandas的freq选项,可以计算均值、总和等在相等时间段内的数值,即freq='H'用于小时计算,freq='12H'用于12小时计算,freq='D'用于每日计算,freq='BH'用于工作小时计算。
示例如下:

  1. avg_12_hours = df.groupby(pd.Grouper(freq='12H', key='Date_Time'))['Level'].mean()

由于您要进行不均等分割的计算期间,因此需要进行一些自定义计算。

英文:

by using Pandas freq option, mean, sum etc. can be calculated for an equal portion of time i.e. freq=&#39;H&#39; for hourly calculation, freq=&#39;12H&#39; for 12 hourly calculation, freq=&#39;D&#39; for daily calculation and freq=&#39;BH&#39; for business hoursly calculations.
Example is below:

  1. avg_12_hours = df.groupby(pd.Grouper(freq=&#39;12H&#39;, key=&#39;Date_Time&#39;))[&#39;Level&#39;].mean()

Since, you are asking for a calculation period which is not equally splitted so, you need to do some custom calculations

huangapple
  • 本文由 发表于 2023年2月6日 13:37:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75357670.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定