如何对用户分组的交易进行滚动计数

huangapple go评论54阅读模式
英文:

How to do a rolling count of transactions grouped by user

问题

以下是您要翻译的内容:

  1. Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
  1. Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
  1. I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)

The results produced look like this and are not what I need:

head of dataframe

英文:

I have a dataset that contains userIds, datetime, transactionId, amount, and merchantDescription columns for the dates ranging between 2022-11-01 and 2022-12-31.

I need to count user transactions during a rolling 5, 10, and 30 day period but I am having difficulty.

Here is my process:

  1. Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
  1. Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
  1. I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)

The results produced look like this and are not what I need:

head of dataframe

Can anyone advise how to achieve a rolling count by user?

答案1

得分: 0

尝试这个

# 将索引设置为日期时间
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# 按userId分组并使用滚动时间窗口计算交易数量
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
# 重命名并合并
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')

您可以将'5D'替换为'10D'或'30D',以分别获得10天或30天的滚动计数。

英文:

Try this

# Set the index to datetime
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# Group by userId and a rolling time window, and count the number of transactions
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
#rename & merge
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')

You can replace '5D' with '10D' or '30D' to get rolling counts for 10 days or 30 days respectively

huangapple
  • 本文由 发表于 2023年3月10日 00:35:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687515.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定