英文:
How to do a rolling count of transactions grouped by user
问题
以下是您要翻译的内容:
- Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
- Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
- I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)
The results produced look like this and are not what I need:
英文:
I have a dataset that contains userIds, datetime, transactionId, amount, and merchantDescription columns for the dates ranging between 2022-11-01 and 2022-12-31.
I need to count user transactions during a rolling 5, 10, and 30 day period but I am having difficulty.
Here is my process:
- Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
- Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
- I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)
The results produced look like this and are not what I need:
Can anyone advise how to achieve a rolling count by user?
答案1
得分: 0
尝试这个
# 将索引设置为日期时间
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# 按userId分组并使用滚动时间窗口计算交易数量
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
# 重命名并合并
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')
您可以将'5D'替换为'10D'或'30D',以分别获得10天或30天的滚动计数。
英文:
Try this
# Set the index to datetime
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# Group by userId and a rolling time window, and count the number of transactions
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
#rename & merge
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')
You can replace '5D' with '10D' or '30D' to get rolling counts for 10 days or 30 days respectively
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论