按首个产品进行同期分析

huangapple go评论58阅读模式
英文:

Cohort Analysis by First Product

问题

以下是您要翻译的内容:

"我已经尝试过自己解决这个问题,但我遇到了我通常的新手问题,似乎无法概念化如何将这个问题拆分为SQL(BigQuery)中的解决方案。

实际上,我正在尝试确定客户如何根据他们的初始产品购买产品,但我想了解n+1矩阵以及n+x下钻是什么,对于特定的起始产品。

例如,如果我有以下数据:

客户      月份      产品      收入
--------   -----    -------     -------
001        2019-01  产品-A     1.00
001        2019-02  产品-B     5.00
001        2019-02  产品-A     2.50
001        2019-03  产品-C     10.00
002        2019-01  产品-B     3.00
002        2019-02  产品-A     4.00
003        2019-02  产品-A     1.50
003        2019-05  产品-B     2.10

我期望n+1矩阵(即,在产品-x之后,有多少客户购买了产品-y?)看起来像这样(这里的x表示NULL):

           产品-A    产品-B    产品-C
产品-A     x          2          0 
产品-B     1          x          1
产品-C     0          0          x

...但我还想深入了解特定产品,以了解不同的客户是如何从产品-X开始的。例如,如果我们正在查看产品-A的群体(即,在使用产品-A之后,他们以什么顺序使用后续产品?):

客户      产品-B     产品-C
001       2           3
003       2

我有一种感觉,我需要创建后者的查询来汇总创建前者,我想说使用PARTITIONROW_NUMBER的巧妙用法将是我的起点,但我无法理解如何着手解决这个问题。

非常感谢任何帮助或指导!"

英文:

I have tried to tackle this myself, but I'm hitting my usual noob-ish problem that I can't seem to conceptualise how I'd break this problem down to solve it in SQL (BigQuery).

Effectively, I am trying to ascertain how customers proceed to purchase products, depending on their initial product — but I'd like to understand both what the n+1 matrix is as well as the n+x drilldown is for a particular starting product.

For example, if I have the following data:

Customer   Month    Product     Revenue
--------   -----    -------     -------
001        2019-01  Product-A   1.00
001        2019-02  Product-B   5.00
001        2019-02  Product-A   2.50
001        2019-03  Product-C   10.00
002        2019-01  Product-B   3.00
002        2019-02  Product-A   4.00
003        2019-02  Product-A   1.50
003        2019-05  Product-B   2.10

I'd expect the n+1 matrix (i.e., after product-x, how many customers bought product-y next?) to look like (x there to represent NULL):

           Product-A  Product-B  Product-C
Product-A     x          2          0 
Product-B     1          x          1
Product-C     0          0          x

...but I'd also want to drill into a particular product to understand how different customers started from Product-X. For example, if we were looking at the Product-A cohorts (i.e., after using Product-A, in what order did they use subsequent products?):

Customer  Product-B   Product-C
001       2           3
003       2

I have a feeling that I need to create the latter query to aggregate to create the former, and I want to say some clever use of PARTITIONing and using ROW_NUMBER would be my starting point, but I just cannot wrap my head around how I'd start to tackle this.

Any help or pointers would be super-appreciated!

答案1

得分: 2

Answering your first question only...

您关于PARTITION BY和ROW_NUMBER的直觉是正确的!

with 
data as (
  select *,row_number() over (partition by Customer order by Month ASC) as rn 
  from <dataset>.<table>
),
first_product as (
  select * from data where rn = 1
),
second_product as (
  select * from data where rn = 2
),
joined as (
  select first_product.Product, second_product.Product, count(*) as ct
  from first_product
  left join second_product using(Customer)
  group by 1,2
)
select * from joined

请注意,上述查询不会进行数据透视(这是一个不同的问题,通常可以在电子表格或BI应用程序中更轻松地解决,特别是因为我们不知道您可能有多少个产品)。

此外,请注意,希望您有一个比'month'更具体的日期类型,以便实际确定真正的顺序(在您的示例中,Customer 001在2月份购买了2种产品,这样很难确定哪个实际上是第二个)。

最后,我很好奇您的示例输出中为什么有'x'。如果一个客户购买A,然后再次购买A,您是否寻找下一个非-A购买来回答您的问题?这只是一些关于您的业务案例的思考。

英文:

Answering your first question only...

Your intuition for PARTITION BY and ROW_NUMBER is good!

with 
data as (
  select *,row_number() over (partition by Customer order by Month ASC) as rn 
  from <dataset>.<table>
),
first_product as (
  select * from data where rn = 1
),
second_product as (
  select * from data where rn = 2
),
joined as (
  select first_product.Product, second_product.Product, count(*) as ct
  from first_product
  left join second_product using(Customer)
  group by 1,2
)
select * from joined

Note, the above query does not pivot (that is a different question imo, and can usually be solved easier in a spreadsheet or BI application, especially as we are not aware of how many Products you might have).

Also note, I hope you have a more specific date type than 'month' so you can actually determine the true order (in your example, Customer 001 has 2 products in Feb, making it hard to tell which is actually 2nd).

Final thought: I am curious as to why your sample output has x's. If a customer purchases A, and then purchases A again, are you looking for the next non-A purchase to answer your question? Just something to think about for your business case.

huangapple
  • 本文由 发表于 2020年1月4日 00:42:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582193.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定