按首个产品进行同期分析

huangapple go评论74阅读模式
英文:

Cohort Analysis by First Product

问题

以下是您要翻译的内容:

"我已经尝试过自己解决这个问题,但我遇到了我通常的新手问题,似乎无法概念化如何将这个问题拆分为SQL(BigQuery)中的解决方案。

实际上,我正在尝试确定客户如何根据他们的初始产品购买产品,但我想了解n+1矩阵以及n+x下钻是什么,对于特定的起始产品。

例如,如果我有以下数据:

  1. 客户 月份 产品 收入
  2. -------- ----- ------- -------
  3. 001 2019-01 产品-A 1.00
  4. 001 2019-02 产品-B 5.00
  5. 001 2019-02 产品-A 2.50
  6. 001 2019-03 产品-C 10.00
  7. 002 2019-01 产品-B 3.00
  8. 002 2019-02 产品-A 4.00
  9. 003 2019-02 产品-A 1.50
  10. 003 2019-05 产品-B 2.10

我期望n+1矩阵(即,在产品-x之后,有多少客户购买了产品-y?)看起来像这样(这里的x表示NULL):

  1. 产品-A 产品-B 产品-C
  2. 产品-A x 2 0
  3. 产品-B 1 x 1
  4. 产品-C 0 0 x

...但我还想深入了解特定产品,以了解不同的客户是如何从产品-X开始的。例如,如果我们正在查看产品-A的群体(即,在使用产品-A之后,他们以什么顺序使用后续产品?):

  1. 客户 产品-B 产品-C
  2. 001 2 3
  3. 003 2

我有一种感觉,我需要创建后者的查询来汇总创建前者,我想说使用PARTITIONROW_NUMBER的巧妙用法将是我的起点,但我无法理解如何着手解决这个问题。

非常感谢任何帮助或指导!"

英文:

I have tried to tackle this myself, but I'm hitting my usual noob-ish problem that I can't seem to conceptualise how I'd break this problem down to solve it in SQL (BigQuery).

Effectively, I am trying to ascertain how customers proceed to purchase products, depending on their initial product — but I'd like to understand both what the n+1 matrix is as well as the n+x drilldown is for a particular starting product.

For example, if I have the following data:

  1. Customer Month Product Revenue
  2. -------- ----- ------- -------
  3. 001 2019-01 Product-A 1.00
  4. 001 2019-02 Product-B 5.00
  5. 001 2019-02 Product-A 2.50
  6. 001 2019-03 Product-C 10.00
  7. 002 2019-01 Product-B 3.00
  8. 002 2019-02 Product-A 4.00
  9. 003 2019-02 Product-A 1.50
  10. 003 2019-05 Product-B 2.10

I'd expect the n+1 matrix (i.e., after product-x, how many customers bought product-y next?) to look like (x there to represent NULL):

  1. Product-A Product-B Product-C
  2. Product-A x 2 0
  3. Product-B 1 x 1
  4. Product-C 0 0 x

...but I'd also want to drill into a particular product to understand how different customers started from Product-X. For example, if we were looking at the Product-A cohorts (i.e., after using Product-A, in what order did they use subsequent products?):

  1. Customer Product-B Product-C
  2. 001 2 3
  3. 003 2

I have a feeling that I need to create the latter query to aggregate to create the former, and I want to say some clever use of PARTITIONing and using ROW_NUMBER would be my starting point, but I just cannot wrap my head around how I'd start to tackle this.

Any help or pointers would be super-appreciated!

答案1

得分: 2

Answering your first question only...

您关于PARTITION BY和ROW_NUMBER的直觉是正确的!

  1. with
  2. data as (
  3. select *,row_number() over (partition by Customer order by Month ASC) as rn
  4. from <dataset>.<table>
  5. ),
  6. first_product as (
  7. select * from data where rn = 1
  8. ),
  9. second_product as (
  10. select * from data where rn = 2
  11. ),
  12. joined as (
  13. select first_product.Product, second_product.Product, count(*) as ct
  14. from first_product
  15. left join second_product using(Customer)
  16. group by 1,2
  17. )
  18. select * from joined

请注意,上述查询不会进行数据透视(这是一个不同的问题,通常可以在电子表格或BI应用程序中更轻松地解决,特别是因为我们不知道您可能有多少个产品)。

此外,请注意,希望您有一个比'month'更具体的日期类型,以便实际确定真正的顺序(在您的示例中,Customer 001在2月份购买了2种产品,这样很难确定哪个实际上是第二个)。

最后,我很好奇您的示例输出中为什么有'x'。如果一个客户购买A,然后再次购买A,您是否寻找下一个非-A购买来回答您的问题?这只是一些关于您的业务案例的思考。

英文:

Answering your first question only...

Your intuition for PARTITION BY and ROW_NUMBER is good!

  1. with
  2. data as (
  3. select *,row_number() over (partition by Customer order by Month ASC) as rn
  4. from <dataset>.<table>
  5. ),
  6. first_product as (
  7. select * from data where rn = 1
  8. ),
  9. second_product as (
  10. select * from data where rn = 2
  11. ),
  12. joined as (
  13. select first_product.Product, second_product.Product, count(*) as ct
  14. from first_product
  15. left join second_product using(Customer)
  16. group by 1,2
  17. )
  18. select * from joined

Note, the above query does not pivot (that is a different question imo, and can usually be solved easier in a spreadsheet or BI application, especially as we are not aware of how many Products you might have).

Also note, I hope you have a more specific date type than 'month' so you can actually determine the true order (in your example, Customer 001 has 2 products in Feb, making it hard to tell which is actually 2nd).

Final thought: I am curious as to why your sample output has x's. If a customer purchases A, and then purchases A again, are you looking for the next non-A purchase to answer your question? Just something to think about for your business case.

huangapple
  • 本文由 发表于 2020年1月4日 00:42:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582193.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定