2023年6月12日 19:14:22go评论92阅读模式

英文:

Calculate login duration by month

问题

这是我的初始数据框架：

索引	ID	事件	日期时间
1	1	登录	10.03.2023 12:00:00.000
2	1	注销	05.04.2023 12:00:00.000
3	2	登录	20.04.2023 12:00:00.000
4	2	注销	22.04.2023 12:00:00.000
5	1	登录	20.05.2023 12:00:00.000
6	1	登录	21.05.2023 12:00:00.000
7	1	注销	22.05.2023 12:00:00.000
8	1	注销	25.05.2023 12:00:00.000

如何在pandas中获得以下结果？

ID	TimeInDays	月份
1	21	三月
1	5	四月
1	6	五月
2	2	四月

首先，您是否需要将其分割为登录和注销数据框，然后将它们连接在一起？

英文:

This is my initial dataframe:

Index	ID	Event	Datetime
1	1	Login	10.03.2023 12:00:00.000
2	1	Logout	05.04.2023 12:00:00.000
3	2	Login	20.04.2023 12:00:00.000
4	2	Logout	22.04.2023 12:00:00.000
5	1	Login	20.05.2023 12:00:00.000
6	1	Login	21.05.2023 12:00:00.000
7	1	Logout	22.05.2023 12:00:00.000
8	1	Logout	25.05.2023 12:00:00.000

How do I get the following result in pandas?

ID	TimeInDays	Month
1	21	March
1	5	April
1	6	May
2	2	April

Do I firstly have to split it into a login and a logout dataframe and then joining it?

答案1

得分: 3

你可以使用自定义函数来拆分月份：

def split_months(start, end):
    return (pd.Series([start, end, *pd.date_range(start, end, freq='M')])
              .drop_duplicates().sort_values(ignore_index=True)
              .to_frame(name='date')
              .assign(TimeInDays=lambda d: d['date'].diff(),
                      Month=lambda d: d.pop('date').dt.month_name()
                     )
              .iloc[1:]
           )
tmp = df.sort_values(by=['ID', 'Datetime'])
out = (tmp
  .groupby(['ID', df['Event'].eq('Login').cumsum()])
  .apply(lambda g: split_months(g['Datetime'].min(), g['Datetime'].max()))
  .reset_index('ID')
)

注意：如果你有多年的数据，你可能需要使用 Month=lambda d: d.pop('date').dt.to_period('M') 来避免歧义。如果你想要天数的整数值，可以使用 TimeInDays=lambda d: d['date'].diff().dt.days。

输出：

   ID TimeInDays  Month
0   1    21 days  March
1   1     5 days  April
2   1     4 days    May
3   2     2 days  April

英文:

You could use a custom function to split the months:

def split_months(start, end):
    return (pd.Series([start, end, *pd.date_range(start, end, freq=&#39;M&#39;)])
              .drop_duplicates().sort_values(ignore_index=True)
              .to_frame(name=&#39;date&#39;)
              .assign(TimeInDays=lambda d: d[&#39;date&#39;].diff(),
                      Month=lambda d: d.pop(&#39;date&#39;).dt.month_name()
                     )
              .iloc[1:]
           )
tmp = df.sort_values(by=[&#39;ID&#39;, &#39;Datetime&#39;])
out = (tmp
  .groupby([&#39;ID&#39;, df[&#39;Event&#39;].eq(&#39;Login&#39;).cumsum()])
  .apply(lambda g: split_months(g[&#39;Datetime&#39;].min(), g[&#39;Datetime&#39;].max()))
  .reset_index(&#39;ID&#39;)
)

NB. if you have several years, you might want to use Month=lambda d: d.pop('date').dt.to_period('M') to avoid ambiguity. If you want an integer for the number of days, use TimeInDays=lambda d: d['date'].diff().dt.days.

Output:

   ID TimeInDays  Month
0   1    21 days  March
1   1     5 days  April
2   1     4 days    May
3   2     2 days  April

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按月计算登录时长

问题

答案1

mkdocs：如何附加可下载的文件

Why this error occuring "TypeError: Series.replace() takes from 1 to 3 positional arguments but 4 were given", Where I send 3 arguments?

通过解析 JSON 列创建一个新列

Excel数据验证在Python中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。