如何在Google BigQuery中计算页面浏览量和退出次数(与GA4关联)?

huangapple go评论82阅读模式
英文:

How to count page Views and Exits in Google Big Query (linked to GA4)?

问题

在Google BigQuery中,我正在尝试通过查询来计算页面浏览次数和“退出”次数。

我编写了以下代码来按页面路径计算页面浏览次数,但仍在尝试计算“退出”次数,其中“退出”计算了会话的最后一个事件发生在特定屏幕上的次数,换句话说,结束在页面路径上的会话:

SELECT
  event_params.value.string_value AS page_path,
  COUNT(*) AS page_views
FROM
  `MY_ga4_dataset.events_*`,
  UNNEST(event_params) AS event_params
WHERE
  _table_suffix BETWEEN '20230207' AND '20230207'
  AND event_name = 'page_view'
  AND event_params.key = 'page_location'
GROUP BY
  page_path
ORDER BY
  page_views DESC

能否提出解决方案?

这是我的模式:

(您提供的模式内容已经超出了一次性翻译的字符限制,请分开提问或缩减文本后重新提问)

英文:

In Google BigQuery linked to GA4, I am trying to count the number of views and "Exits" via query

如何在Google BigQuery中计算页面浏览量和退出次数(与GA4关联)?

I wrote the following code to count the views by page path, however still struggling to count the "Exits", where where “Exits” counts the number of times that the last event recorded for a session occurred on a particular screen, in other words sessions that ended on a page path

  SELECT
  event_params.value.string_value AS page_path,
  COUNT(*) AS page_views
FROM
  `MY_ga4_dataset.events_*`,
  UNNEST(event_params) AS event_params
WHERE
  _table_suffix BETWEEN '20230207' AND '20230207'
  AND event_name = 'page_view'
  AND event_params.key = 'page_location'
GROUP BY
  page_path
ORDER BY
  page_views DESC

Could you propose the solution?

This is my Schema:

fullname	mode	type	description
event_date	NULLABLE	STRING	
event_timestamp	NULLABLE	INTEGER	
event_name	NULLABLE	STRING	
event_params	REPEATED	RECORD	
event_previous_timestamp	NULLABLE	INTEGER	
event_value_in_usd	NULLABLE	FLOAT	
event_bundle_sequence_id	NULLABLE	INTEGER	
event_server_timestamp_offset	NULLABLE	INTEGER	
user_id	NULLABLE	STRING	
user_pseudo_id	NULLABLE	STRING	
privacy_info	NULLABLE	RECORD	
user_properties	REPEATED	RECORD	
user_first_touch_timestamp	NULLABLE	INTEGER	
user_ltv	NULLABLE	RECORD	
device	NULLABLE	RECORD	
geo	NULLABLE	RECORD	
app_info	NULLABLE	RECORD	
traffic_source	NULLABLE	RECORD	
stream_id	NULLABLE	STRING	
platform	NULLABLE	STRING	
event_dimensions	NULLABLE	RECORD	
ecommerce	NULLABLE	RECORD	
items	REPEATED	RECORD	
event_params.key	NULLABLE	STRING	
event_params.value	NULLABLE	RECORD	
event_params.value.string_value	NULLABLE	STRING	
event_params.value.int_value	NULLABLE	INTEGER	
event_params.value.float_value	NULLABLE	FLOAT	
event_params.value.double_value	NULLABLE	FLOAT	
privacy_info.analytics_storage	NULLABLE	STRING	
privacy_info.ads_storage	NULLABLE	STRING	
privacy_info.uses_transient_token	NULLABLE	STRING	
user_properties.key	NULLABLE	STRING	
user_properties.value	NULLABLE	RECORD	
user_properties.value.string_value	NULLABLE	STRING	
user_properties.value.int_value	NULLABLE	INTEGER	
user_properties.value.float_value	NULLABLE	FLOAT	
user_properties.value.double_value	NULLABLE	FLOAT	
user_properties.value.set_timestamp_micros	NULLABLE	INTEGER	
user_ltv.revenue	NULLABLE	FLOAT	
user_ltv.currency	NULLABLE	STRING	
device.category	NULLABLE	STRING	
device.mobile_brand_name	NULLABLE	STRING	
device.mobile_model_name	NULLABLE	STRING	
device.mobile_marketing_name	NULLABLE	STRING	
device.mobile_os_hardware_model	NULLABLE	STRING	
device.operating_system	NULLABLE	STRING	
device.operating_system_version	NULLABLE	STRING	
device.vendor_id	NULLABLE	STRING	
device.advertising_id	NULLABLE	STRING	
device.language	NULLABLE	STRING	
device.is_limited_ad_tracking	NULLABLE	STRING	
device.time_zone_offset_seconds	NULLABLE	INTEGER	
device.browser	NULLABLE	STRING	
device.browser_version	NULLABLE	STRING	
device.web_info	NULLABLE	RECORD	
device.web_info.browser	NULLABLE	STRING	
device.web_info.browser_version	NULLABLE	STRING	
device.web_info.hostname	NULLABLE	STRING	
geo.continent	NULLABLE	STRING	
geo.country	NULLABLE	STRING	
geo.region	NULLABLE	STRING	
geo.city	NULLABLE	STRING	
geo.sub_continent	NULLABLE	STRING	
geo.metro	NULLABLE	STRING	
app_info.id	NULLABLE	STRING	
app_info.version	NULLABLE	STRING	
app_info.install_store	NULLABLE	STRING	
app_info.firebase_app_id	NULLABLE	STRING	
app_info.install_source	NULLABLE	STRING	
traffic_source.name	NULLABLE	STRING	
traffic_source.medium	NULLABLE	STRING	
traffic_source.source	NULLABLE	STRING	
event_dimensions.hostname	NULLABLE	STRING	
ecommerce.total_item_quantity	NULLABLE	INTEGER	
ecommerce.purchase_revenue_in_usd	NULLABLE	FLOAT	
ecommerce.purchase_revenue	NULLABLE	FLOAT	
ecommerce.refund_value_in_usd	NULLABLE	FLOAT	
ecommerce.refund_value	NULLABLE	FLOAT	
ecommerce.shipping_value_in_usd	NULLABLE	FLOAT	
ecommerce.shipping_value	NULLABLE	FLOAT	
ecommerce.tax_value_in_usd	NULLABLE	FLOAT	
ecommerce.tax_value	NULLABLE	FLOAT	
ecommerce.unique_items	NULLABLE	INTEGER	
ecommerce.transaction_id	NULLABLE	STRING	
items.item_id	NULLABLE	STRING	
items.item_name	NULLABLE	STRING	
items.item_brand	NULLABLE	STRING	
items.item_variant	NULLABLE	STRING	
items.item_category	NULLABLE	STRING	
items.item_category2	NULLABLE	STRING	
items.item_category3	NULLABLE	STRING	
items.item_category4	NULLABLE	STRING	
items.item_category5	NULLABLE	STRING	
items.price_in_usd	NULLABLE	FLOAT	
items.price	NULLABLE	FLOAT	
items.quantity	NULLABLE	INTEGER	
items.item_revenue_in_usd	NULLABLE	FLOAT	
items.item_revenue	NULLABLE	FLOAT	
items.item_refund_in_usd	NULLABLE	FLOAT	
items.item_refund	NULLABLE	FLOAT	
items.coupon	NULLABLE	STRING	
items.affiliation	NULLABLE	STRING	
items.location_id	NULLABLE	STRING	
items.item_list_id	NULLABLE	STRING	
items.item_list_name	NULLABLE	STRING	
items.item_list_index	NULLABLE	STRING	
items.promotion_id	NULLABLE	STRING	
items.promotion_name	NULLABLE	STRING	
items.creative_name	NULLABLE	STRING	
items.creative_slot	NULLABLE	STRING	

答案1

得分: 0

以下是代码的翻译部分:

"BigQuery does not count exits so it has to guess based on last event, which is a different methodology to GA4 which means the number of exits will always be slightly different":
"BigQuery 不计算退出,因此它必须根据最后一个事件来猜测,这与 GA4 不同,这意味着退出的数量始终会略有不同"

"Here is a code to show the previous and next page visited showing entrances and exits":
"以下是一个代码,用于显示先前和后续访问的页面,显示入口和退出"

接下来是代码的详细翻译,但我只会提供代码中的注释部分的翻译:

  • "with events as (":代码部分开始,定义一个名为 "events" 的临时数据表。

  • "select user_pseudo_id,":选择用户伪 ID。

  • "concat(":将字符串连接起来。

  • "(case when(":开始一个条件语句。

  • "select value.string_value":选择值为字符串类型的数据。

  • "from unnest(event_params)":从事件参数中获取数据。

  • "where key = 'session_engaged' AND event_name = 'page_view'":条件:键为 'session_engaged' 且事件名称为 'page_view'。

  • ")=1 then 1 else 0 end) as engaged_sessions_id,":如果条件满足,则 engaged_sessions_id 为 1,否则为 0。

  • "event_name,":选择事件名称。

  • "event_timestamp,":选择事件时间戳。

  • "regexp_replace(":使用正则表达式替换。

  • "regexp_replace(":再次使用正则表达式替换。

  • "select p.value.string_value":选择值为字符串类型的数据。

  • "from unnest(event_params) as p":从事件参数中获取数据并将其命名为 "p"。

  • "where p.key = 'page_location'":条件:键为 'page_location'。

  • "), r'^https?://[^/]+', ''":替换正则表达式匹配的部分为 ''。

  • "r'[?].*', ''":替换正则表达式匹配的部分为 ''。

  • ") as page_path,":将结果命名为 "page_path"。

  • "event_date as date":选择事件日期并将其命名为 "date"。

  • "FROM bigquery-3.analytics_2.events_*":从指定的 BigQuery 表格中获取数据。

  • "where _table_suffix between '20221129' and '20221130')":筛选数据,使其在指定日期范围内。

  • "select":选择以下列。

  • "page_path,":选择页面路径。

  • "date,":选择日期。

  • "event_timestamp,":选择事件时间戳。

  • "if(":开始条件语句。

  • "event_name = 'page_view',":如果事件名称为 'page_view'。

  • "coalesce(":如果前一个条件满足,则返回第一个非 NULL 值。

  • "last_value(":获取最后一个值。

  • "if(event_name = 'page_view', page_path, null) ignore nulls":如果事件名称为 'page_view',则返回页面路径,否则返回 NULL。

  • "over(":在指定窗口内。

  • "partition by unique_session_id":按唯一会话 ID 进行分区。

  • "order by event_timestamp asc rows between unbounded preceding and 1 preceding":按事件时间戳升序排序,考虑未限定的前导行和前 1 行。

  • "), '(entrance)'":如果未满足前一个条件,则返回 '(entrance)'。

  • "null":否则返回 NULL。

  • "as previous_page,":将结果命名为 "previous_page"。

  • "if(":开始条件语句。

  • "event_name = 'page_view',":如果事件名称为 'page_view'。

  • "coalesce(":如果前一个条件满足,则返回第一个非 NULL 值。

  • "first_value(":获取第一个值。

  • "if(event_name = 'page_view', page_path, null) ignore nulls":如果事件名称为 'page_view',则返回页面路径,否则返回 NULL。

  • "over(":在指定窗口内。

  • "partition by unique_session_id":按唯一会话 ID 进行分区。

  • "order by event_timestamp asc rows between 1 following and unbounded following":按事件时间戳升序排序,考虑后 1 行和未限定的后续行。

  • "), '(exit)'":如果未满足前一个条件,则返回 '(exit)'。

  • "null":否则返回 NULL。

  • "as next_page":将结果命名为 "next_page"。

  • "from":从以下数据表获取数据。

  • "events":数据表名称。

  • "where event_name = 'page_view'":筛选事件名称为 'page_view' 的数据。

英文:

BigQuery does not count exits so it has to guess based on last event, which is a different methodology to GA4 which means the number of exits will always be slightly different

Here is a code to show the previous and next page visited showing entrances and exits

    with events as (
      select
     user_pseudo_id,
--get seesion ID
     concat(
        user_pseudo_id,
         (
        select
          value.int_value
        from
          unnest(event_params)
        where
          key = 'ga_session_id'
      )
    ) as unique_session_id,

      (case when(
        select
          value.string_value
        from
          unnest(event_params)
        where
          key = 'session_engaged' AND event_name = 'page_view'
      ) ="1" then 1 else 0 end) as engaged_sessions_id,
    event_name,
    event_timestamp,
    regexp_replace(
  regexp_replace(
    (
      select
        p.value.string_value
      from
        unnest(event_params) as p
      where
        p.key = 'page_location'
    ),
    r'^https?://[^/]+',
    ''
  ),
  r'[\?].*',
  ''
) as page_path,
event_date as date

FROM 
`bigquery-3.analytics_2.events_*`

where
 _table_suffix between '20221129'and '20221130')

select

  page_path,
  date,
  event_timestamp,

  -- look for the previous page_path
  if(
    event_name = 'page_view',
    coalesce(
      last_value(
        if(event_name = 'page_view', page_path, null) ignore nulls
      ) over(
        partition by unique_session_id
        order by
          event_timestamp asc rows between unbounded preceding
          and 1 preceding
      ),
      '(entrance)'
    ),
    null
  ) as previous_page,
  -- look for the next page_path
  if(
    event_name = 'page_view',
    coalesce(
      first_value(
        if(event_name = 'page_view', page_path, null) ignore nulls
      ) over(
        partition by unique_session_id
        order by
          event_timestamp asc rows between 1 following
          and unbounded following
      ),
      '(exit)'
    ),
    null
  ) as next_page
from
  events
  where
 event_name = 'page_view'

huangapple
  • 本文由 发表于 2023年2月16日 19:02:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75471322.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定