如何在Snowflake中每个ID只显示1行,而不是2行?

huangapple go评论57阅读模式
英文:

How can I have 1 line per id, instead of 2 in snowflake?

问题

我想知道用户在会话中访问的第1页和第2页(以此类推)是什么。
我已经成功通过row_n函数获取了交互流程。

然而,我想要一个包含交互流程的列,但是,以以下示例为例,我想要一行,而不是2行,其中填满了FIRST和SECOND列。

以下是我的代码:

SELECT
    ID,
    PAGE_TITLE,
    FIRST,
    SECOND
FROM (
SELECT
    distinct
    concat(user_pseudo_id, ga_session_id) as id,
    page_title,
    ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
    iff(interaction_number = 1, page_title, '') as first,
    iff(interaction_number = 2, page_title, '') as second
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
where id = '1000002360.16826696111682669619'
GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
;
英文:

I want to know what is the 1st and the 2nd page (and so on) visited by a user in a session.
I already managed to get the flow of interaction through the row_n function.

However, I want a column with the flow of the interaction, but, taking the following example, instead of 2 lines I want 1 line with FIRST and SECOND columns filled up.

如何在Snowflake中每个ID只显示1行,而不是2行?

Here's my code:

SELECT
    ID,
    PAGE_TITLE,
    FIRST,
    SECOND
FROM (
SELECT
    distinct
    concat(user_pseudo_id, ga_session_id) as id,
    page_title,
    ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
    iff(interaction_number = 1, page_title, '') as first,
    iff(interaction_number = 2, page_title, '') as second
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
where id = '1000002360.16826696111682669619'
GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
;```

</details>


# 答案1
**得分**: 1

```sql
你能做类似这样的吗:
```sql
SELECT
    MAX(CASE WHEN interaction_number = 1 THEN page_title ELSE '&#39;&#39;' END) as first,
    MAX(CASE WHEN interaction_number = 2 THEN page_title ELSE '&#39;&#39;' END) as second,
    MAX(CASE WHEN interaction_number = 3 THEN page_title ELSE '&#39;&#39;' END) as third,
    MAX(CASE WHEN interaction_number = 4 THEN page_title ELSE '&#39;&#39;' END) as fourth
FROM (
SELECT
    concat(user_pseudo_id, ga_session_id) as id,
    page_title ,
    TO_TIMESTAMP(EVENT_TIMESTAMP) AS EVENT_TIMESTAMP,
    ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY TO_TIMESTAMP(event_timestamp)) AS interaction_number
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
GROUP BY page_title , event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
group by id
;```
英文:

Can you do something like:

SELECT
    MAX(CASE WHEN interaction_number = 1 THEN page_title ELSE &#39;&#39; END) as first,
    MAX(CASE WHEN interaction_number = 2 THEN page_title ELSE &#39;&#39; END) as second,
    MAX(CASE WHEN interaction_number = 3 THEN page_title ELSE &#39;&#39; END) as third,
    MAX(CASE WHEN interaction_number = 4 THEN page_title  ELSE &#39;&#39; END) as fourth
FROM (
SELECT
    concat(user_pseudo_id, ga_session_id) as id,
    page_title ,
    TO_TIMESTAMP(EVENT_TIMESTAMP) AS EVENT_TIMESTAMP,
    ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY TO_TIMESTAMP(event_timestamp)) AS interaction_number
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
GROUP BY page_title , event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
group by id
;```

</details>



# 答案2
**得分**: 1

以下是您要翻译的代码部分:

```sql
so with a CTE for fake data, we can reproduce you example:

with RAW_EVENTS_PIVOT(user_pseudo_id, ga_session_id, page_title, event_timestamp, ga_session_number) as (
    select * from values
    ('1.', 10, 'aaa', 100, 1000),
    ('1.', 10, 'aaa', 101, 1000),
    ('1.', 10, 'aaa', 102, 1000)
)
SELECT
    ID,
    PAGE_TITLE,
    FIRST,
    SECOND
FROM (
    SELECT
        distinct
        concat(user_pseudo_id, ga_session_id) as id,
        page_title,
        ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
        iff(interaction_number = 1, page_title, '') as first,
        iff(interaction_number = 2, page_title, '') as second
    FROM RAW_EVENTS_PIVOT
    where id = '1.10'
    GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
    order by id
);
ID PAGE_TITLE FIRST SECOND
1.1 aaa aaa
1.1 aaa aaa
1.1 aaa

assistant

英文:

so with a CTE for fake data, we can reproduce you example:

with RAW_EVENTS_PIVOT(user_pseudo_id, ga_session_id, page_title, event_timestamp, ga_session_number) as (
    select * from values
    (&#39;1.&#39;, 10, &#39;aaa&#39;, 100, 1000),
    (&#39;1.&#39;, 10, &#39;aaa&#39;, 101, 1000),
    (&#39;1.&#39;, 10, &#39;aaa&#39;, 102, 1000)
)
SELECT
    ID,
    PAGE_TITLE,
    FIRST,
    SECOND
FROM (
    SELECT
        distinct
        concat(user_pseudo_id, ga_session_id) as id,
        page_title,
        ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
        iff(interaction_number = 1, page_title, &#39;&#39;) as first,
        iff(interaction_number = 2, page_title, &#39;&#39;) as second
    FROM RAW_EVENTS_PIVOT
    where id = &#39;1.10&#39;
    GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
    order by id
);
ID PAGE_TITLE FIRST SECOND
1.1 aaa aaa
1.1 aaa aaa
1.1 aaa

Notes: 'ORDER BY' should be avoided in none final statements, as it is meaningless, but costs compute time. DISTINCT should not be needed, as you are doing a GROUP BY, but you are grouping by a column (ga_session_number) that is not present in the SELECTION, nor in the ROW_NUMBER, so it is possible you could get the same selected results and then throw some data away (unless ga_session_id and ga_session_number are locked together?).

So given you are ranking the page_titles, first/second this implies there is some change in these, but you are also grouping by these, so it seems you are wanting to select all distinct pages visited, and then the first/seconds also.

To be honest, I cannot guess, how you are want to deduplicate the data, as you have too many conflicting strategies used.

BUT there is a function NTH_VALUE which does what you have done, like so:

with RAW_EVENTS_PIVOT(user_pseudo_id, ga_session_id, page_title, event_timestamp, ga_session_number) as (
    select * from values
    (&#39;1.&#39;, &#39;10&#39;, &#39;aaa&#39;, 100, 1000),
    (&#39;1.&#39;, &#39;10&#39;, &#39;aab&#39;, 101, 1000),
    (&#39;1.&#39;, &#39;10&#39;, &#39;aac&#39;, 102, 1000)
)
SELECT
    DISTINCT
    concat(user_pseudo_id, ga_session_id) as id,
    page_title,
    NTH_VALUE(page_title, 1) OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) as first,
    NTH_VALUE(page_title, 2) OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) as second
FROM RAW_EVENTS_PIVOT
;

如何在Snowflake中每个ID只显示1行,而不是2行?

huangapple
  • 本文由 发表于 2023年7月24日 20:01:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76754298.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定