在pandas数据框中获取二级索引的值范围

huangapple go评论89阅读模式
英文:

How to get range of values in secondary index of pandas dataframe

问题

我有一个具有两个索引的多索引pandas数据帧。第一个索引是'room',第二个是'timestamp'。该表的列是'total occupancy','temperature','power used'和'event'。

正在跟踪的情况是考虑一个有多个宴会厅的酒店。这些房间会被预订用于活动。定期,酒店管理记录房间的占用情况和室温。

我想按'event'分组,并获取'total occupancy','temperature'的最大值和最小值之间的差异。我还想获取时间戳的最大值和最小值之间的差异,以便测量事件的持续时间,但一直无法做到。

例如,考虑以下数据帧:

# 初始化数据帧
rm_timestamp_indices = [('A', 1300),('A', 1310),('A', 1315), 
                        ('B', 1200),('B', 1230),('B', 1350),
                        ('C', 1300),('C', 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=['Room', 'TimeStamp'])
df = pd.DataFrame(index=multi_index)

# 将数据放入数据帧
df['temp'] = [77,78,73,80,76,66,73,70]
df['pop'] = [100,110,200,300,315,290,245,250]
df['event'] = ['q','q','w','r','t','t','s','s']

现在,我可以通过以下方式获取列的最大值和最小值之间的差异:

df.groupby('event').apply(lambda x: x.max() - x.min())

但是一直无法获取每个事件的时间戳的最大值和最小值之间的差异。

英文:

I have a multi-indexed pandas dataframe with two indices. The first index is 'room', the second is 'timestamp'. The columns of this table are 'total occupancy', 'temperature', 'power used' and 'event'.

The situation being tracked is consider a hotel with several ball rooms. these rooms get booked for events. Periodically, hotel mngmt records the occupancy of the rooms androom temperature.

I want to groupby 'event', and get the difference between the max & mins of 'total occupancy', 'temperature'. I also want to get the difference between max & mins for timestamps , so I can measure event length, but have been unable to.

For example, consider the following df:

#Initialize df
rm_timestamp_indices = [('A', 1300),('A', 1310),('A', 1315), 
                        ('B', 1200),('B', 1230),('B', 1350),
                        ('C', 1300),('C', 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=['Room', 'TimeStamp'])
df = pd.DataFrame(index=multi_index)

# Put data into df
df['temp'] = [77,78,73,80,76,66,73,70]
df['pop'] = [100,110,200,300,315,290,245,250]
df['event'] = ['q','q','w','r','t','t','s','s']

Now, I can get the differences between the max and mins of the columns by

df.groupby('event').apply(lambda x:x.max()-x.min())

but have not been able to also get the difference between the max and min of the timestamps for each event.

答案1

得分: 1

你可以使用reset_index在分组之前将TimeStamp索引值带入数据框中:

df.reset_index('TimeStamp').groupby('event').apply(lambda x:x.max()-x.min())

输出:

       TimeStamp  temp  pop
event
q             10     1   10
r              0     0    0
s            100     3    5
t            120    10   25
w              0     0    0
英文:

You could use reset_index to bring the TimeStamp index value into the dataframe prior to grouping:

df.reset_index('TimeStamp').groupby('event').apply(lambda x:x.max()-x.min())

Output:

       TimeStamp  temp  pop
event
q             10     1   10
r              0     0    0
s            100     3    5
t            120    10   25
w              0     0    0

huangapple
  • 本文由 发表于 2023年5月26日 09:10:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定