2023年8月11日 00:29:54go评论185阅读模式

英文:

Display customer transaction frequency within the last X days

问题

我想要创建一个新的列，以显示每个客户在过去X天内的交易次数，每次有新交易时都会更新。其中X = 1。

我有两列，Customer_ID 和 TransactionDate。我想要创建一个名为 "HowManyTransactionsInLast1Day" 的列。

我希望如下所示：

Customer_ID  TransactionDate    HowManyTransactionsInLast1Day
A	     2021-03-03 	        0
B	     2021-05-10 	        0
B	     2022-09-06 	        0
C	     2010-09-10 	        0
D	     2009-05-24 	        0
D	     2009-05-25 	        1
D	     2009-05-25 	        2
D	     2009-05-26	            2
D	     2009-08-14 	        0
D	     2009-08-14 	        1

我能够使用 pandas 的 .rolling 方法找到一个解决方案，似乎可以完成任务，但在处理500多万行时速度太慢了。以下是我的代码：

delta = 1
df['%sDTransactionCount_Cust' %(delta)] = df.assign(count=1).groupby(['Customer_ID']).apply(lambda x: x.rolling('%sD' %delta, on='TransactionDate').sum())['count'].astype(int)-1

有没有更快、更简洁的方法来实现这个目标？任何见解将不胜感激！

英文:

I want to create a new column to display the number of transactions each customer has had in the last X days, updated with each new transaction. Let X = 1.

I have the columns Customer_ID and TransactionDate. I want to create column "HowManyTransactionsInLast1Day".

Something like this is what I want:

Customer_ID  TransactionDate    HowManyTransactionsInLast1Day
A	     2021-03-03 	        0
B	     2021-05-10 	        0
B	     2022-09-06 	        0
C	     2010-09-10 	        0
D	     2009-05-24 	        0
D	     2009-05-25 	        1
D	     2009-05-25 	        2
D	     2009-05-26	            2
D	     2009-08-14 	        0
D	     2009-08-14 	        1

I was able to find a solution using pandas .rolling that seems to do the trick, but it's way too slow with the 5M+ rows I'm working with. Here's what I have:

delta = 1
df[&#39;%sDTransactionCount_Cust&#39; %(delta)] = df.assign(count=1).groupby([&#39;Customer_ID&#39;]).apply(lambda x: x.rolling(&#39;%sD&#39; %delta, on=&#39;TransactionDate&#39;).sum())[&#39;count&#39;].astype(int)-1

Is there a faster, clean way to do this? Any insight would be much appreciated!

答案1

得分: 0

将apply删除后，我觉得代码看起来更清晰。我认为它也会稍微快一点：

data = '''
Customer_ID,TransactionDate
A,2021-03-03
B,2021-05-10
B,2022-09-06
C,2010-09-10
D,2009-05-24
D,2009-05-25
D,2009-05-25
D,2009-05-26
D,2009-08-14
D,2009-08-14
'''
df = pd.read_csv(StringIO(data), parse_dates=['TransactionDate'])
df.assign(HowManyTransactionsInLast1Day=1) \
    .groupby('Customer_ID') \
    .rolling('1D', on='TransactionDate', closed='both')['HowManyTransactionsInLast1Day'] \
    .sum() - 1

结果如下：

Customer_ID  TransactionDate
A            2021-03-03         0.0
B            2021-05-10         0.0
             2022-09-06         0.0
C            2010-09-10         0.0
D            2009-05-24         0.0
             2009-05-25         1.0
             2009-05-25         2.0
             2009-05-26         2.0
             2009-08-14         0.0
             2009-08-14         1.0
Name: HowManyTransactionsInLast1Day, dtype: float64

英文:

Removing the apply makes it look cleaner to me. I think it will be a little faster as well:

data = &#39;&#39;&#39;
Customer_ID,TransactionDate
A,2021-03-03
B,2021-05-10
B,2022-09-06
C,2010-09-10
D,2009-05-24
D,2009-05-25
D,2009-05-25
D,2009-05-26
D,2009-08-14
D,2009-08-14
&#39;&#39;&#39;
df = pd.read_csv(StringIO(data), parse_dates=[&#39;TransactionDate&#39;])
df.assign(HowManyTransactionsInLast1Day=1) \
    .groupby(&#39;Customer_ID&#39;) \
    .rolling(&#39;1D&#39;, on=&#39;TransactionDate&#39;, closed=&#39;both&#39;)[&#39;HowManyTransactionsInLast1Day&#39;] \
    .sum() - 1
Customer_ID  TransactionDate
A            2021-03-03         0.0
B            2021-05-10         0.0
             2022-09-06         0.0
C            2010-09-10         0.0
D            2009-05-24         0.0
             2009-05-25         1.0
             2009-05-25         2.0
             2009-05-26         2.0
             2009-08-14         0.0
             2009-08-14         1.0
Name: HowManyTransactionsInLast1Day, dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

显示最近X天内客户交易频率。

问题

答案1

使用Python的tkinter创建的模拟时钟，但一运行它，就在1秒后停止。

如何在Python 3中循环遍历CSV数值数据作为单独的行

获取使用 boto3 的 wcu/rcu 为 200 或 200+ 的表格。

Scheduling a python script with Windows task scheduler

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。