英文:
How to get range of values in secondary index of pandas dataframe
问题
我有一个具有两个索引的多索引pandas数据帧。第一个索引是'room',第二个是'timestamp'。该表的列是'total occupancy','temperature','power used'和'event'。
正在跟踪的情况是考虑一个有多个宴会厅的酒店。这些房间会被预订用于活动。定期,酒店管理记录房间的占用情况和室温。
我想按'event'分组,并获取'total occupancy','temperature'的最大值和最小值之间的差异。我还想获取时间戳的最大值和最小值之间的差异,以便测量事件的持续时间,但一直无法做到。
例如,考虑以下数据帧:
# 初始化数据帧
rm_timestamp_indices = [('A', 1300),('A', 1310),('A', 1315),
('B', 1200),('B', 1230),('B', 1350),
('C', 1300),('C', 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=['Room', 'TimeStamp'])
df = pd.DataFrame(index=multi_index)
# 将数据放入数据帧
df['temp'] = [77,78,73,80,76,66,73,70]
df['pop'] = [100,110,200,300,315,290,245,250]
df['event'] = ['q','q','w','r','t','t','s','s']
现在,我可以通过以下方式获取列的最大值和最小值之间的差异:
df.groupby('event').apply(lambda x: x.max() - x.min())
但是一直无法获取每个事件的时间戳的最大值和最小值之间的差异。
英文:
I have a multi-indexed pandas dataframe with two indices. The first index is 'room', the second is 'timestamp'. The columns of this table are 'total occupancy', 'temperature', 'power used' and 'event'.
The situation being tracked is consider a hotel with several ball rooms. these rooms get booked for events. Periodically, hotel mngmt records the occupancy of the rooms androom temperature.
I want to groupby 'event', and get the difference between the max & mins of 'total occupancy', 'temperature'. I also want to get the difference between max & mins for timestamps , so I can measure event length, but have been unable to.
For example, consider the following df:
#Initialize df
rm_timestamp_indices = [('A', 1300),('A', 1310),('A', 1315),
('B', 1200),('B', 1230),('B', 1350),
('C', 1300),('C', 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=['Room', 'TimeStamp'])
df = pd.DataFrame(index=multi_index)
# Put data into df
df['temp'] = [77,78,73,80,76,66,73,70]
df['pop'] = [100,110,200,300,315,290,245,250]
df['event'] = ['q','q','w','r','t','t','s','s']
Now, I can get the differences between the max and mins of the columns by
df.groupby('event').apply(lambda x:x.max()-x.min())
but have not been able to also get the difference between the max and min of the timestamps for each event.
答案1
得分: 1
你可以使用reset_index
在分组之前将TimeStamp
索引值带入数据框中:
df.reset_index('TimeStamp').groupby('event').apply(lambda x:x.max()-x.min())
输出:
TimeStamp temp pop
event
q 10 1 10
r 0 0 0
s 100 3 5
t 120 10 25
w 0 0 0
英文:
You could use reset_index
to bring the TimeStamp
index value into the dataframe prior to grouping:
df.reset_index('TimeStamp').groupby('event').apply(lambda x:x.max()-x.min())
Output:
TimeStamp temp pop
event
q 10 1 10
r 0 0 0
s 100 3 5
t 120 10 25
w 0 0 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论