按月计算登录时长

huangapple go评论57阅读模式
英文:

Calculate login duration by month

问题

这是我的初始数据框架:

索引 ID 事件 日期时间
1 1 登录 10.03.2023 12:00:00.000
2 1 注销 05.04.2023 12:00:00.000
3 2 登录 20.04.2023 12:00:00.000
4 2 注销 22.04.2023 12:00:00.000
5 1 登录 20.05.2023 12:00:00.000
6 1 登录 21.05.2023 12:00:00.000
7 1 注销 22.05.2023 12:00:00.000
8 1 注销 25.05.2023 12:00:00.000

如何在pandas中获得以下结果?

ID TimeInDays 月份
1 21 三月
1 5 四月
1 6 五月
2 2 四月

首先,您是否需要将其分割为登录和注销数据框,然后将它们连接在一起?

英文:

This is my initial dataframe:

Index ID Event Datetime
1 1 Login 10.03.2023 12:00:00.000
2 1 Logout 05.04.2023 12:00:00.000
3 2 Login 20.04.2023 12:00:00.000
4 2 Logout 22.04.2023 12:00:00.000
5 1 Login 20.05.2023 12:00:00.000
6 1 Login 21.05.2023 12:00:00.000
7 1 Logout 22.05.2023 12:00:00.000
8 1 Logout 25.05.2023 12:00:00.000

How do I get the following result in pandas?

ID TimeInDays Month
1 21 March
1 5 April
1 6 May
2 2 April

Do I firstly have to split it into a login and a logout dataframe and then joining it?

答案1

得分: 3

你可以使用自定义函数来拆分月份:

def split_months(start, end):
    return (pd.Series([start, end, *pd.date_range(start, end, freq='M')])
              .drop_duplicates().sort_values(ignore_index=True)
              .to_frame(name='date')
              .assign(TimeInDays=lambda d: d['date'].diff(),
                      Month=lambda d: d.pop('date').dt.month_name()
                     )
              .iloc[1:]
           )

tmp = df.sort_values(by=['ID', 'Datetime'])

out = (tmp
  .groupby(['ID', df['Event'].eq('Login').cumsum()])
  .apply(lambda g: split_months(g['Datetime'].min(), g['Datetime'].max()))
  .reset_index('ID')
)

注意:如果你有多年的数据,你可能需要使用 Month=lambda d: d.pop('date').dt.to_period('M') 来避免歧义。如果你想要天数的整数值,可以使用 TimeInDays=lambda d: d['date'].diff().dt.days

输出:

   ID TimeInDays  Month
0   1    21 days  March
1   1     5 days  April
2   1     4 days    May
3   2     2 days  April
英文:

You could use a custom function to split the months:

def split_months(start, end):
    return (pd.Series([start, end, *pd.date_range(start, end, freq='M')])
              .drop_duplicates().sort_values(ignore_index=True)
              .to_frame(name='date')
              .assign(TimeInDays=lambda d: d['date'].diff(),
                      Month=lambda d: d.pop('date').dt.month_name()
                     )
              .iloc[1:]
           )

tmp = df.sort_values(by=['ID', 'Datetime'])

out = (tmp
  .groupby(['ID', df['Event'].eq('Login').cumsum()])
  .apply(lambda g: split_months(g['Datetime'].min(), g['Datetime'].max()))
  .reset_index('ID')
)

NB. if you have several years, you might want to use Month=lambda d: d.pop('date').dt.to_period('M') to avoid ambiguity. If you want an integer for the number of days, use TimeInDays=lambda d: d['date'].diff().dt.days.

Output:

   ID TimeInDays  Month
0   1    21 days  March
1   1     5 days  April
2   1     4 days    May
3   2     2 days  April

huangapple
  • 本文由 发表于 2023年6月12日 19:14:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456108.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定