2023年5月26日 09:10:52go评论125阅读模式

英文:

How to get range of values in secondary index of pandas dataframe

问题

我有一个具有两个索引的多索引pandas数据帧。第一个索引是'room'，第二个是'timestamp'。该表的列是'total occupancy'，'temperature'，'power used'和'event'。

正在跟踪的情况是考虑一个有多个宴会厅的酒店。这些房间会被预订用于活动。定期，酒店管理记录房间的占用情况和室温。

我想按'event'分组，并获取'total occupancy'，'temperature'的最大值和最小值之间的差异。我还想获取时间戳的最大值和最小值之间的差异，以便测量事件的持续时间，但一直无法做到。

例如，考虑以下数据帧：

# 初始化数据帧
rm_timestamp_indices = [('A', 1300),('A', 1310),('A', 1315), 
                        ('B', 1200),('B', 1230),('B', 1350),
                        ('C', 1300),('C', 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=['Room', 'TimeStamp'])
df = pd.DataFrame(index=multi_index)
# 将数据放入数据帧
df['temp'] = [77,78,73,80,76,66,73,70]
df['pop'] = [100,110,200,300,315,290,245,250]
df['event'] = ['q','q','w','r','t','t','s','s']

现在，我可以通过以下方式获取列的最大值和最小值之间的差异：

df.groupby('event').apply(lambda x: x.max() - x.min())

但是一直无法获取每个事件的时间戳的最大值和最小值之间的差异。

英文:

I have a multi-indexed pandas dataframe with two indices. The first index is 'room', the second is 'timestamp'. The columns of this table are 'total occupancy', 'temperature', 'power used' and 'event'.

The situation being tracked is consider a hotel with several ball rooms. these rooms get booked for events. Periodically, hotel mngmt records the occupancy of the rooms androom temperature.

I want to groupby 'event', and get the difference between the max & mins of 'total occupancy', 'temperature'. I also want to get the difference between max & mins for timestamps , so I can measure event length, but have been unable to.

For example, consider the following df:

#Initialize df
rm_timestamp_indices = [(&#39;A&#39;, 1300),(&#39;A&#39;, 1310),(&#39;A&#39;, 1315), 
                        (&#39;B&#39;, 1200),(&#39;B&#39;, 1230),(&#39;B&#39;, 1350),
                        (&#39;C&#39;, 1300),(&#39;C&#39;, 1400)]
multi_index = pd.MultiIndex.from_tuples(rm_timestamp_indices, names=[&#39;Room&#39;, &#39;TimeStamp&#39;])
df = pd.DataFrame(index=multi_index)
# Put data into df
df[&#39;temp&#39;] = [77,78,73,80,76,66,73,70]
df[&#39;pop&#39;] = [100,110,200,300,315,290,245,250]
df[&#39;event&#39;] = [&#39;q&#39;,&#39;q&#39;,&#39;w&#39;,&#39;r&#39;,&#39;t&#39;,&#39;t&#39;,&#39;s&#39;,&#39;s&#39;]

Now, I can get the differences between the max and mins of the columns by

df.groupby(&#39;event&#39;).apply(lambda x:x.max()-x.min())

but have not been able to also get the difference between the max and min of the timestamps for each event.

答案1

得分: 1

你可以使用reset_index在分组之前将TimeStamp索引值带入数据框中：

df.reset_index('TimeStamp').groupby('event').apply(lambda x:x.max()-x.min())

输出：

       TimeStamp  temp  pop
event
q             10     1   10
r              0     0    0
s            100     3    5
t            120    10   25
w              0     0    0

英文:

You could use reset_index to bring the TimeStamp index value into the dataframe prior to grouping:

df.reset_index(&#39;TimeStamp&#39;).groupby(&#39;event&#39;).apply(lambda x:x.max()-x.min())

Output:

       TimeStamp  temp  pop
event
q             10     1   10
r              0     0    0
s            100     3    5
t            120    10   25
w              0     0    0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas数据框中获取二级索引的值范围

问题

答案1

在pandas数据框上应用滚动函数，带有多个参数。

如何将年度变化反转以填充NaN值？

如何使用Java将一个Spark DataFrame的行替换为另一个Spark DataFrame的行。

填补时间间隔中的空白部分与其他时间间隔。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。