显示最近X天内客户交易频率。

huangapple go评论141阅读模式
英文:

Display customer transaction frequency within the last X days

问题

我想要创建一个新的列,以显示每个客户在过去X天内的交易次数,每次有新交易时都会更新。其中X = 1。

我有两列,Customer_ID 和 TransactionDate。我想要创建一个名为 "HowManyTransactionsInLast1Day" 的列。

我希望如下所示:

Customer_ID  TransactionDate    HowManyTransactionsInLast1Day
A	     2021-03-03 	        0
B	     2021-05-10 	        0
B	     2022-09-06 	        0
C	     2010-09-10 	        0
D	     2009-05-24 	        0
D	     2009-05-25 	        1
D	     2009-05-25 	        2
D	     2009-05-26	            2
D	     2009-08-14 	        0
D	     2009-08-14 	        1

我能够使用 pandas 的 .rolling 方法找到一个解决方案,似乎可以完成任务,但在处理500多万行时速度太慢了。以下是我的代码:

delta = 1
df['%sDTransactionCount_Cust' %(delta)] = df.assign(count=1).groupby(['Customer_ID']).apply(lambda x: x.rolling('%sD' %delta, on='TransactionDate').sum())['count'].astype(int)-1

有没有更快、更简洁的方法来实现这个目标?任何见解将不胜感激!

英文:

I want to create a new column to display the number of transactions each customer has had in the last X days, updated with each new transaction. Let X = 1.

I have the columns Customer_ID and TransactionDate. I want to create column "HowManyTransactionsInLast1Day".

Something like this is what I want:

Customer_ID  TransactionDate    HowManyTransactionsInLast1Day
A	     2021-03-03 	        0
B	     2021-05-10 	        0
B	     2022-09-06 	        0
C	     2010-09-10 	        0
D	     2009-05-24 	        0
D	     2009-05-25 	        1
D	     2009-05-25 	        2
D	     2009-05-26	            2
D	     2009-08-14 	        0
D	     2009-08-14 	        1

I was able to find a solution using pandas .rolling that seems to do the trick, but it's way too slow with the 5M+ rows I'm working with. Here's what I have:

delta = 1
df['%sDTransactionCount_Cust' %(delta)] = df.assign(count=1).groupby(['Customer_ID']).apply(lambda x: x.rolling('%sD' %delta, on='TransactionDate').sum())['count'].astype(int)-1

Is there a faster, clean way to do this? Any insight would be much appreciated!

答案1

得分: 0

apply删除后,我觉得代码看起来更清晰。我认为它也会稍微快一点:

data = '''
Customer_ID,TransactionDate
A,2021-03-03
B,2021-05-10
B,2022-09-06
C,2010-09-10
D,2009-05-24
D,2009-05-25
D,2009-05-25
D,2009-05-26
D,2009-08-14
D,2009-08-14
'''

df = pd.read_csv(StringIO(data), parse_dates=['TransactionDate'])

df.assign(HowManyTransactionsInLast1Day=1) \
    .groupby('Customer_ID') \
    .rolling('1D', on='TransactionDate', closed='both')['HowManyTransactionsInLast1Day'] \
    .sum() - 1

结果如下:

Customer_ID  TransactionDate
A            2021-03-03         0.0
B            2021-05-10         0.0
             2022-09-06         0.0
C            2010-09-10         0.0
D            2009-05-24         0.0
             2009-05-25         1.0
             2009-05-25         2.0
             2009-05-26         2.0
             2009-08-14         0.0
             2009-08-14         1.0
Name: HowManyTransactionsInLast1Day, dtype: float64
英文:

Removing the apply makes it look cleaner to me. I think it will be a little faster as well:

data = '''
Customer_ID,TransactionDate
A,2021-03-03
B,2021-05-10
B,2022-09-06
C,2010-09-10
D,2009-05-24
D,2009-05-25
D,2009-05-25
D,2009-05-26
D,2009-08-14
D,2009-08-14
'''

df = pd.read_csv(StringIO(data), parse_dates=['TransactionDate'])

df.assign(HowManyTransactionsInLast1Day=1) \
    .groupby('Customer_ID') \
    .rolling('1D', on='TransactionDate', closed='both')['HowManyTransactionsInLast1Day'] \
    .sum() - 1

Customer_ID  TransactionDate
A            2021-03-03         0.0
B            2021-05-10         0.0
             2022-09-06         0.0
C            2010-09-10         0.0
D            2009-05-24         0.0
             2009-05-25         1.0
             2009-05-25         2.0
             2009-05-26         2.0
             2009-08-14         0.0
             2009-08-14         1.0
Name: HowManyTransactionsInLast1Day, dtype: float64

huangapple
  • 本文由 发表于 2023年8月11日 00:29:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76877669.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定