How to categorize Google Analytics data into channels based on entrance URL when dealing with large datasets in BigQuery?

huangapple go评论64阅读模式
英文:

How to categorize Google Analytics data into channels based on entrance URL when dealing with large datasets in BigQuery?

问题

我正在使用存储在BigQuery中的Google Analytics数据,并需要根据入口URL将访问分类为渠道(有机、通讯和付费)。但是,数据集非常庞大,跨越数TB,我不确定如何有效处理此任务的正确方法。目前,我的代码检索每个访问的入口URL,但我需要帮助扩展它以将访问分类为渠道。以下是我的现有代码:

SELECT
  clientid,
  visitid,
  visitnumber,
  (SELECT h.page.pagepath FROM UNNEST(hits) h WHERE h.isentrance = true) AS entrance_url
FROM
  `test.test.ga_sessions_*`
WHERE
  _table_suffix BETWEEN '20230301' AND '20230628'

请问有人能指导我正确的方法,以根据入口URL将访问分类为渠道,同时有效处理BigQuery中的大型数据集吗?谢谢!

示例入口URL(已修改以保护隐私):

/ca/ca/shop/parcel-tracking?order=&zip=&country=CA
/ca/ca/shop/faqs
/ca/ca/shop/newsletter/unsubscribe?shop=CA&lang=en&uid=&cid=&llid=&emaid=&sc_src=email_&sc_customer=&sc_llid=&sc_lid=&sc_uid=&emst=**********_
/us/us/shop
/us/us/shop/swimwear
/ca/ca/shop/women
/us/us/shop/pyjama-trousers-**********/1

注意:国家代码(例如,“ca”,“us”)和产品名称已随机生成,以保护隐私同时保留数据的本质。

提前感谢您!

英文:

I am working with Google Analytics data stored in BigQuery and I need to categorize visits into channels (organic, newsletter, and paid) based on the entrance URL. However, the dataset is quite large, spanning several terabytes, and I am unsure about the correct approach to efficiently handle this task. Currently, my code retrieves the entrance URL for each visit, but I need help expanding it to categorize visits into channels. Here's my existing code:

SELECT
  clientid,
  visitid,
  visitnumber,
  (SELECT h.page.pagepath FROM UNNEST(hits) h WHERE h.isentrance = true) AS entrance_url
FROM
  `test.test.ga_sessions_*`
WHERE
  _table_suffix BETWEEN '20230301' AND '20230628'

Could someone please guide me on the correct approach to categorize visits into channels based on the entrance URL while efficiently handling the large dataset in BigQuery? Thank you!

Example entrance URLs (modified for privacy):

/ca/ca/shop/parcel-tracking?order=&zip=&country=CA
/ca/ca/shop/faqs
/ca/ca/shop/newsletter/unsubscribe?shop=CA&lang=en&uid=&cid=&llid=&emaid=&sc_src=email_&sc_customer=&sc_llid=&sc_lid=&sc_uid=&emst=**********_
/us/us/shop
/us/us/shop/swimwear
/ca/ca/shop/women
/us/us/shop/pyjama-trousers-**********/1

Note: The country codes (e.g., "ca", "us") and product names have been randomly generated to protect privacy while preserving the essence of the data.

Thank you in advance!!

答案1

得分: 1

为了有效处理BigQuery中的大型数据集并根据入口URL对访问进行分类,您可以使用以下代码:

WITH
  channel_mapping AS (
  SELECT
    'organic' AS channel,
    '/google/' AS url_pattern
  UNION ALL
  SELECT
    'newsletter' AS channel,
    '/newsletter/' AS url_pattern
  UNION ALL
  SELECT
    'paid' AS channel,
    '/cpc/' AS url_pattern )
SELECT
  clientid,
  visitid,
  visitnumber,
  entrance_url,
  (SELECT channel FROM channel_mapping WHERE entrance_url LIKE CONCAT('%', url_pattern, '%')) AS channel
FROM (
  SELECT
    clientid,
    visitid,
    visitnumber,
    (SELECT h.page.pagepath FROM UNNEST(hits) h WHERE h.isentrance = true) AS entrance_url
  FROM
    `test.test.ga_sessions_*`
  WHERE
    _table_suffix BETWEEN '20230301' AND '20230628')

您可以根据需要修改channel_mapping CTE以添加或删除通道。您还可以修改URL模式以匹配您的特定用例。

英文:

To categorize visits into channels based on the entrance URL while efficiently handling the large dataset in BigQuery,<br><br>

WITH
  channel_mapping AS (
  SELECT
    &#39;organic&#39; AS channel,
    &#39;/google/&#39; AS url_pattern
  UNION ALL
  SELECT
    &#39;newsletter&#39; AS channel,
    &#39;/newsletter/&#39; AS url_pattern
  UNION ALL
  SELECT
    &#39;paid&#39; AS channel,
    &#39;/cpc/&#39; AS url_pattern )
SELECT
  clientid,
  visitid,
  visitnumber,
  entrance_url,
  (SELECT channel FROM channel_mapping WHERE entrance_url LIKE CONCAT(&#39;%&#39;, url_pattern, &#39;%&#39;)) AS channel
FROM (
  SELECT
    clientid,
    visitid,
    visitnumber,
    (SELECT h.page.pagepath FROM UNNEST(hits) h WHERE h.isentrance = true) AS entrance_url
  FROM
    `test.test.ga_sessions_*`
  WHERE
    _table_suffix BETWEEN &#39;20230301&#39; AND &#39;20230628&#39;)

You can modify the channel_mapping CTE to add or remove channels as needed. You can also modify the URL patterns to match your specific use case<br>

huangapple
  • 本文由 发表于 2023年6月29日 19:11:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580493.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定