英文:
Display customer transaction frequency within the last X days
问题
我想要创建一个新的列,以显示每个客户在过去X天内的交易次数,每次有新交易时都会更新。其中X = 1。
我有两列,Customer_ID 和 TransactionDate。我想要创建一个名为 "HowManyTransactionsInLast1Day" 的列。
我希望如下所示:
Customer_ID TransactionDate HowManyTransactionsInLast1Day
A 2021-03-03 0
B 2021-05-10 0
B 2022-09-06 0
C 2010-09-10 0
D 2009-05-24 0
D 2009-05-25 1
D 2009-05-25 2
D 2009-05-26 2
D 2009-08-14 0
D 2009-08-14 1
我能够使用 pandas 的 .rolling 方法找到一个解决方案,似乎可以完成任务,但在处理500多万行时速度太慢了。以下是我的代码:
delta = 1
df['%sDTransactionCount_Cust' %(delta)] = df.assign(count=1).groupby(['Customer_ID']).apply(lambda x: x.rolling('%sD' %delta, on='TransactionDate').sum())['count'].astype(int)-1
有没有更快、更简洁的方法来实现这个目标?任何见解将不胜感激!
英文:
I want to create a new column to display the number of transactions each customer has had in the last X days, updated with each new transaction. Let X = 1.
I have the columns Customer_ID and TransactionDate. I want to create column "HowManyTransactionsInLast1Day".
Something like this is what I want:
Customer_ID TransactionDate HowManyTransactionsInLast1Day
A 2021-03-03 0
B 2021-05-10 0
B 2022-09-06 0
C 2010-09-10 0
D 2009-05-24 0
D 2009-05-25 1
D 2009-05-25 2
D 2009-05-26 2
D 2009-08-14 0
D 2009-08-14 1
I was able to find a solution using pandas .rolling that seems to do the trick, but it's way too slow with the 5M+ rows I'm working with. Here's what I have:
delta = 1
df['%sDTransactionCount_Cust' %(delta)] = df.assign(count=1).groupby(['Customer_ID']).apply(lambda x: x.rolling('%sD' %delta, on='TransactionDate').sum())['count'].astype(int)-1
Is there a faster, clean way to do this? Any insight would be much appreciated!
答案1
得分: 0
将apply
删除后,我觉得代码看起来更清晰。我认为它也会稍微快一点:
data = '''
Customer_ID,TransactionDate
A,2021-03-03
B,2021-05-10
B,2022-09-06
C,2010-09-10
D,2009-05-24
D,2009-05-25
D,2009-05-25
D,2009-05-26
D,2009-08-14
D,2009-08-14
'''
df = pd.read_csv(StringIO(data), parse_dates=['TransactionDate'])
df.assign(HowManyTransactionsInLast1Day=1) \
.groupby('Customer_ID') \
.rolling('1D', on='TransactionDate', closed='both')['HowManyTransactionsInLast1Day'] \
.sum() - 1
结果如下:
Customer_ID TransactionDate
A 2021-03-03 0.0
B 2021-05-10 0.0
2022-09-06 0.0
C 2010-09-10 0.0
D 2009-05-24 0.0
2009-05-25 1.0
2009-05-25 2.0
2009-05-26 2.0
2009-08-14 0.0
2009-08-14 1.0
Name: HowManyTransactionsInLast1Day, dtype: float64
英文:
Removing the apply makes it look cleaner to me. I think it will be a little faster as well:
data = '''
Customer_ID,TransactionDate
A,2021-03-03
B,2021-05-10
B,2022-09-06
C,2010-09-10
D,2009-05-24
D,2009-05-25
D,2009-05-25
D,2009-05-26
D,2009-08-14
D,2009-08-14
'''
df = pd.read_csv(StringIO(data), parse_dates=['TransactionDate'])
df.assign(HowManyTransactionsInLast1Day=1) \
.groupby('Customer_ID') \
.rolling('1D', on='TransactionDate', closed='both')['HowManyTransactionsInLast1Day'] \
.sum() - 1
Customer_ID TransactionDate
A 2021-03-03 0.0
B 2021-05-10 0.0
2022-09-06 0.0
C 2010-09-10 0.0
D 2009-05-24 0.0
2009-05-25 1.0
2009-05-25 2.0
2009-05-26 2.0
2009-08-14 0.0
2009-08-14 1.0
Name: HowManyTransactionsInLast1Day, dtype: float64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论