英文:
How to calculate average level on certain days and times in a pandas data frame
问题
我有一个数据框,如下所示
日期时间 | 水平 |
---|---|
2017-08-08 23:55:01 | 239.0 |
2017-08-08 23:50:01 | 242.0 |
2017-08-08 23:45:01 | 246.0 |
2017-08-08 23:40:01 | 250.0 |
2017-08-08 23:35:01 | 254.0 |
... | ... |
2017-07-26 00:23:57 | 72.0 |
2017-07-26 00:18:57 | 67.0 |
2017-07-26 00:13:57 | 64.0 |
2017-07-26 00:08:57 | 64.0 |
2017-07-26 00:03:57 | 65.0 |
我想要计算每天在清醒时间和夜间时间的平均水平
日期 | 时间 | 平均水平 |
---|---|---|
2017-08-08 | 00:00:00 - 06:00:00 | 178 |
2017-08-08 | 06:00:01 - 23:59:99 | 190 |
2017-09-08 | 00:00:00 - 06:00:00 | 174 |
2017-09-08 | 06:00:01 - 23:59:99 | 200 |
我已经尝试将其拆分为单独的表格并使用for循环,但那会使用太多内存并且耗时太长。
英文:
I have a data frame like so
Date_Time | Level |
---|---|
2017-08-08 23:55:01 | 239.0 |
2017-08-08 23:50:01 | 242.0 |
2017-08-08 23:45:01 | 246.0 |
2017-08-08 23:40:01 | 250.0 |
2017-08-08 23:35:01 | 254.0 |
... | ... |
2017-07-26 00:23:57 | 72.0 |
2017-07-26 00:18:57 | 67.0 |
2017-07-26 00:13:57 | 64.0 |
2017-07-26 00:08:57 | 64.0 |
2017-07-26 00:03:57 | 65.0 |
I want to calculate the average level on every day, during the waking hours and overnight hours
Date | Time | AvgLevel |
---|---|---|
2017-08-08 | 00:00:00 - 06:00:00 | 178 |
2017-08-08 | 06:00:01 - 23:59:99 | 190 |
2017-09-08 | 00:00:00 - 06:00:00 | 174 |
2017-09-08 | 06:00:01 - 23:59:99 | 200 |
I've already tried splitting into separate tables and using for loops however that uses too much memory and takes too much time
答案1
得分: 1
可以使用 np.where
来区分白天和夜晚的时间段。
创建样本数据
data = {
'Date_Time': [
'2017-08-08 00:00:00', '2017-08-08 23:50:01', '2017-08-08 06:45:01',
'2017-08-08 06:00:00', '2017-08-08 00:35:01',
'2017-07-26 00:23:57', '2017-07-26 00:18:57', '2017-07-26 07:13:57',
'2017-07-26 00:08:57', '2017-07-26 07:03:57'
],
'Level': [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}
df = pd.DataFrame(data, columns=['Date_Time', 'Level'])
df['Date_Time'] = pd.to_datetime(df['Date_Time'])
df = df.set_index('Date_Time')
print(df)
创建白天和夜晚时间段的掩码
mask = (df.index.time >= pd.to_datetime('00:00:00').time()) & (df.index.time <= pd.to_datetime('06:00:00').time())
df['Period'] = np.where(mask, '00:00:00 - 06:00:00', '06:00:01 - 23:59:59')
df
按 Date_Time
和 Period
列分组,并计算平均 Level
result = df.groupby([df.index.date, 'Period'])['Level'].mean().reset_index()
result.columns = ['Date', 'Time', 'AvgLevel']
result
英文:
You can use np.where
to differentiate between waking hours and overnight hours
Creating sample data
data = {
'Date_Time': [
'2017-08-08 00:00:00', '2017-08-08 23:50:01', '2017-08-08 06:45:01',
'2017-08-08 06:00:00', '2017-08-08 00:35:01',
'2017-07-26 00:23:57', '2017-07-26 00:18:57', '2017-07-26 07:13:57',
'2017-07-26 00:08:57', '2017-07-26 07:03:57'
],
'Level': [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}
df = pd.DataFrame(data, columns=['Date_Time', 'Level'])
df['Date_Time'] = pd.to_datetime(df['Date_Time'])
df = df.set_index('Date_Time')
print(df)
Level
Date_Time
2017-08-08 00:00:00 239.0
2017-08-08 23:50:01 242.0
2017-08-08 06:45:01 246.0
2017-08-08 06:00:00 250.0
2017-08-08 00:35:01 254.0
2017-07-26 00:23:57 72.0
2017-07-26 00:18:57 67.0
2017-07-26 07:13:57 64.0
2017-07-26 00:08:57 64.0
2017-07-26 07:03:57 65.0
Creating a mask of waking hours and overnight hours
mask = (df.index.time >= pd.to_datetime('00:00:00').time()) & (df.index.time <= pd.to_datetime('06:00:00').time())
df['Period'] = np.where(mask, '00:00:00 - 06:00:00', '06:00:01 - 23:59:59')
df
Level Period
Date_Time
2017-08-08 00:00:00 239.0 00:00:00 - 06:00:00
2017-08-08 23:50:01 242.0 06:00:01 - 23:59:59
2017-08-08 06:45:01 246.0 06:00:01 - 23:59:59
2017-08-08 06:00:00 250.0 00:00:00 - 06:00:00
2017-08-08 00:35:01 254.0 00:00:00 - 06:00:00
2017-07-26 00:23:57 72.0 00:00:00 - 06:00:00
2017-07-26 00:18:57 67.0 00:00:00 - 06:00:00
2017-07-26 07:13:57 64.0 06:00:01 - 23:59:59
2017-07-26 00:08:57 64.0 00:00:00 - 06:00:00
2017-07-26 07:03:57 65.0 06:00:01 - 23:59:59
Groupby the Date_Time
and Period
column and calculate average Level
result = df.groupby([df.index.date, 'Period'])['Level'].mean().reset_index()
result.columns = ['Date', 'Time', 'AvgLevel']
result
Date Time AvgLevel
0 2017-07-26 00:00:00 - 06:00:00 67.666667
1 2017-07-26 06:00:01 - 23:59:59 64.500000
2 2017-08-08 00:00:00 - 06:00:00 247.666667
3 2017-08-08 06:00:01 - 23:59:59 244.000000
答案2
得分: 1
以下是已翻译的代码部分:
import pandas as pd
df = pd.read_csv("data.csv", sep=";")
print(df)
df["Date_Time"] = pd.to_datetime(df["Date_Time"])
df["Date"] = df["Date_Time"].dt.date
df["Time"] = df["Date_Time"].dt.time
df["Time_Period"] = "Overnight"
df.loc[(df["Time"] >= pd.to_datetime("06:00:00").time()) & (df["Time"] <= pd.to_datetime("23:59:59").time()), "Time_Period"] = "Waking"
grouped = df.groupby(["Date", "Time_Period"])["Level"].mean().reset_index()
grouped = grouped.rename(columns={"Date": "Date", "Time_Period": "Time", "Level": "AvgLevel"})
grouped["Time"] = grouped["Time"].map({
"Waking": "06:00:01 - 23:59:99",
"Overnight": "00:00:00 - 06:00:00"
})
print(grouped)
希望对你有所帮助。如果你需要更多的帮助,请随时告诉我。
英文:
You can do the following:
import pandas as pd
df = pd.read_csv("data.csv", sep=";")
print(df)
df["Date_Time"] = pd.to_datetime(df["Date_Time"])
df["Date"] = df["Date_Time"].dt.date
df["Time"] = df["Date_Time"].dt.time
df["Time_Period"] = "Overnight"
df.loc[(df["Time"] >= pd.to_datetime("06:00:00").time()) & (df["Time"] <= pd.to_datetime("23:59:59").time()), "Time_Period"] = "Waking"
grouped = df.groupby(["Date", "Time_Period"])["Level"].mean().reset_index()
grouped = grouped.rename(columns={"Date": "Date", "Time_Period": "Time", "Level": "AvgLevel"})
grouped["Time"] = grouped["Time"].map({
"Waking": "06:00:01 - 23:59:99",
"Overnight": "00:00:00 - 06:00:00"
})
print(grouped)
Basically, you group entries by times that are nightly and daily:
This results in (I assume here that your expected outcome you print is for the entire dataframe):
Date Time AvgLevel
0 2017-07-26 00:00:00 - 06:00:00 66.4
1 2017-08-08 06:00:01 - 23:59:99 246.2
答案3
得分: 0
使用Pandas的freq
选项,可以计算均值、总和等在相等时间段内的数值,即freq='H'
用于小时计算,freq='12H'
用于12小时计算,freq='D'
用于每日计算,freq='BH'
用于工作小时计算。
示例如下:
avg_12_hours = df.groupby(pd.Grouper(freq='12H', key='Date_Time'))['Level'].mean()
由于您要进行不均等分割的计算期间,因此需要进行一些自定义计算。
英文:
by using Pandas freq
option, mean, sum etc. can be calculated for an equal portion of time i.e. freq='H'
for hourly calculation, freq='12H'
for 12 hourly calculation, freq='D'
for daily calculation and freq='BH'
for business hoursly calculations.
Example is below:
avg_12_hours = df.groupby(pd.Grouper(freq='12H', key='Date_Time'))['Level'].mean()
Since, you are asking for a calculation period which is not equally splitted so, you need to do some custom calculations
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论