英文:
Cohort Analysis by First Product
问题
以下是您要翻译的内容:
"我已经尝试过自己解决这个问题,但我遇到了我通常的新手问题,似乎无法概念化如何将这个问题拆分为SQL(BigQuery)中的解决方案。
实际上,我正在尝试确定客户如何根据他们的初始产品购买产品,但我想了解n+1矩阵以及n+x下钻是什么,对于特定的起始产品。
例如,如果我有以下数据:
客户 月份 产品 收入
-------- ----- ------- -------
001 2019-01 产品-A 1.00
001 2019-02 产品-B 5.00
001 2019-02 产品-A 2.50
001 2019-03 产品-C 10.00
002 2019-01 产品-B 3.00
002 2019-02 产品-A 4.00
003 2019-02 产品-A 1.50
003 2019-05 产品-B 2.10
我期望n+1矩阵(即,在产品-x之后,有多少客户购买了产品-y?)看起来像这样(这里的x
表示NULL):
产品-A 产品-B 产品-C
产品-A x 2 0
产品-B 1 x 1
产品-C 0 0 x
...但我还想深入了解特定产品,以了解不同的客户是如何从产品-X开始的。例如,如果我们正在查看产品-A的群体(即,在使用产品-A之后,他们以什么顺序使用后续产品?):
客户 产品-B 产品-C
001 2 3
003 2
我有一种感觉,我需要创建后者的查询来汇总创建前者,我想说使用PARTITION
和ROW_NUMBER
的巧妙用法将是我的起点,但我无法理解如何着手解决这个问题。
非常感谢任何帮助或指导!"
英文:
I have tried to tackle this myself, but I'm hitting my usual noob-ish problem that I can't seem to conceptualise how I'd break this problem down to solve it in SQL (BigQuery).
Effectively, I am trying to ascertain how customers proceed to purchase products, depending on their initial product — but I'd like to understand both what the n+1 matrix is as well as the n+x drilldown is for a particular starting product.
For example, if I have the following data:
Customer Month Product Revenue
-------- ----- ------- -------
001 2019-01 Product-A 1.00
001 2019-02 Product-B 5.00
001 2019-02 Product-A 2.50
001 2019-03 Product-C 10.00
002 2019-01 Product-B 3.00
002 2019-02 Product-A 4.00
003 2019-02 Product-A 1.50
003 2019-05 Product-B 2.10
I'd expect the n+1 matrix (i.e., after product-x, how many customers bought product-y next?) to look like (x
there to represent NULL):
Product-A Product-B Product-C
Product-A x 2 0
Product-B 1 x 1
Product-C 0 0 x
...but I'd also want to drill into a particular product to understand how different customers started from Product-X. For example, if we were looking at the Product-A cohorts (i.e., after using Product-A, in what order did they use subsequent products?):
Customer Product-B Product-C
001 2 3
003 2
I have a feeling that I need to create the latter query to aggregate to create the former, and I want to say some clever use of PARTITION
ing and using ROW_NUMBER
would be my starting point, but I just cannot wrap my head around how I'd start to tackle this.
Any help or pointers would be super-appreciated!
答案1
得分: 2
Answering your first question only...
您关于PARTITION BY和ROW_NUMBER的直觉是正确的!
with
data as (
select *,row_number() over (partition by Customer order by Month ASC) as rn
from <dataset>.<table>
),
first_product as (
select * from data where rn = 1
),
second_product as (
select * from data where rn = 2
),
joined as (
select first_product.Product, second_product.Product, count(*) as ct
from first_product
left join second_product using(Customer)
group by 1,2
)
select * from joined
请注意,上述查询不会进行数据透视(这是一个不同的问题,通常可以在电子表格或BI应用程序中更轻松地解决,特别是因为我们不知道您可能有多少个产品)。
此外,请注意,希望您有一个比'month'更具体的日期类型,以便实际确定真正的顺序(在您的示例中,Customer 001在2月份购买了2种产品,这样很难确定哪个实际上是第二个)。
最后,我很好奇您的示例输出中为什么有'x'。如果一个客户购买A,然后再次购买A,您是否寻找下一个非-A购买来回答您的问题?这只是一些关于您的业务案例的思考。
英文:
Answering your first question only...
Your intuition for PARTITION BY and ROW_NUMBER is good!
with
data as (
select *,row_number() over (partition by Customer order by Month ASC) as rn
from <dataset>.<table>
),
first_product as (
select * from data where rn = 1
),
second_product as (
select * from data where rn = 2
),
joined as (
select first_product.Product, second_product.Product, count(*) as ct
from first_product
left join second_product using(Customer)
group by 1,2
)
select * from joined
Note, the above query does not pivot (that is a different question imo, and can usually be solved easier in a spreadsheet or BI application, especially as we are not aware of how many Products you might have).
Also note, I hope you have a more specific date type than 'month' so you can actually determine the true order (in your example, Customer 001 has 2 products in Feb, making it hard to tell which is actually 2nd).
Final thought: I am curious as to why your sample output has x
's. If a customer purchases A, and then purchases A again, are you looking for the next non-A purchase to answer your question? Just something to think about for your business case.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论