英文:
How can I have 1 line per id, instead of 2 in snowflake?
问题
我想知道用户在会话中访问的第1页和第2页(以此类推)是什么。
我已经成功通过row_n函数获取了交互流程。
然而,我想要一个包含交互流程的列,但是,以以下示例为例,我想要一行,而不是2行,其中填满了FIRST和SECOND列。
以下是我的代码:
SELECT
ID,
PAGE_TITLE,
FIRST,
SECOND
FROM (
SELECT
distinct
concat(user_pseudo_id, ga_session_id) as id,
page_title,
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
iff(interaction_number = 1, page_title, '') as first,
iff(interaction_number = 2, page_title, '') as second
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
where id = '1000002360.16826696111682669619'
GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
;
英文:
I want to know what is the 1st and the 2nd page (and so on) visited by a user in a session.
I already managed to get the flow of interaction through the row_n function.
However, I want a column with the flow of the interaction, but, taking the following example, instead of 2 lines I want 1 line with FIRST and SECOND columns filled up.
Here's my code:
SELECT
ID,
PAGE_TITLE,
FIRST,
SECOND
FROM (
SELECT
distinct
concat(user_pseudo_id, ga_session_id) as id,
page_title,
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
iff(interaction_number = 1, page_title, '') as first,
iff(interaction_number = 2, page_title, '') as second
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
where id = '1000002360.16826696111682669619'
GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
;```
</details>
# 答案1
**得分**: 1
```sql
你能做类似这样的吗:
```sql
SELECT
MAX(CASE WHEN interaction_number = 1 THEN page_title ELSE '''' END) as first,
MAX(CASE WHEN interaction_number = 2 THEN page_title ELSE '''' END) as second,
MAX(CASE WHEN interaction_number = 3 THEN page_title ELSE '''' END) as third,
MAX(CASE WHEN interaction_number = 4 THEN page_title ELSE '''' END) as fourth
FROM (
SELECT
concat(user_pseudo_id, ga_session_id) as id,
page_title ,
TO_TIMESTAMP(EVENT_TIMESTAMP) AS EVENT_TIMESTAMP,
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY TO_TIMESTAMP(event_timestamp)) AS interaction_number
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
GROUP BY page_title , event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
group by id
;```
英文:
Can you do something like:
SELECT
MAX(CASE WHEN interaction_number = 1 THEN page_title ELSE '' END) as first,
MAX(CASE WHEN interaction_number = 2 THEN page_title ELSE '' END) as second,
MAX(CASE WHEN interaction_number = 3 THEN page_title ELSE '' END) as third,
MAX(CASE WHEN interaction_number = 4 THEN page_title ELSE '' END) as fourth
FROM (
SELECT
concat(user_pseudo_id, ga_session_id) as id,
page_title ,
TO_TIMESTAMP(EVENT_TIMESTAMP) AS EVENT_TIMESTAMP,
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY TO_TIMESTAMP(event_timestamp)) AS interaction_number
FROM CDP_GA4_RAW.RAW_EVENTS_PIVOT
GROUP BY page_title , event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id)
group by id
;```
</details>
# 答案2
**得分**: 1
以下是您要翻译的代码部分:
```sql
so with a CTE for fake data, we can reproduce you example:
with RAW_EVENTS_PIVOT(user_pseudo_id, ga_session_id, page_title, event_timestamp, ga_session_number) as (
select * from values
('1.', 10, 'aaa', 100, 1000),
('1.', 10, 'aaa', 101, 1000),
('1.', 10, 'aaa', 102, 1000)
)
SELECT
ID,
PAGE_TITLE,
FIRST,
SECOND
FROM (
SELECT
distinct
concat(user_pseudo_id, ga_session_id) as id,
page_title,
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
iff(interaction_number = 1, page_title, '') as first,
iff(interaction_number = 2, page_title, '') as second
FROM RAW_EVENTS_PIVOT
where id = '1.10'
GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id
);
ID | PAGE_TITLE | FIRST | SECOND |
---|---|---|---|
1.1 | aaa | aaa | |
1.1 | aaa | aaa | |
1.1 | aaa |
assistant
英文:
so with a CTE for fake data, we can reproduce you example:
with RAW_EVENTS_PIVOT(user_pseudo_id, ga_session_id, page_title, event_timestamp, ga_session_number) as (
select * from values
('1.', 10, 'aaa', 100, 1000),
('1.', 10, 'aaa', 101, 1000),
('1.', 10, 'aaa', 102, 1000)
)
SELECT
ID,
PAGE_TITLE,
FIRST,
SECOND
FROM (
SELECT
distinct
concat(user_pseudo_id, ga_session_id) as id,
page_title,
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) AS interaction_number,
iff(interaction_number = 1, page_title, '') as first,
iff(interaction_number = 2, page_title, '') as second
FROM RAW_EVENTS_PIVOT
where id = '1.10'
GROUP BY page_title, event_timestamp, user_pseudo_id, ga_session_id, ga_session_number
order by id
);
ID | PAGE_TITLE | FIRST | SECOND |
---|---|---|---|
1.1 | aaa | aaa | |
1.1 | aaa | aaa | |
1.1 | aaa |
Notes: 'ORDER BY' should be avoided in none final statements, as it is meaningless, but costs compute time. DISTINCT
should not be needed, as you are doing a GROUP BY
, but you are grouping by a column (ga_session_number
) that is not present in the SELECTION, nor in the ROW_NUMBER, so it is possible you could get the same selected results and then throw some data away (unless ga_session_id and ga_session_number are locked together?).
So given you are ranking the page_titles, first/second this implies there is some change in these, but you are also grouping by these, so it seems you are wanting to select all distinct pages visited, and then the first/seconds also.
To be honest, I cannot guess, how you are want to deduplicate the data, as you have too many conflicting strategies used.
BUT there is a function NTH_VALUE which does what you have done, like so:
with RAW_EVENTS_PIVOT(user_pseudo_id, ga_session_id, page_title, event_timestamp, ga_session_number) as (
select * from values
('1.', '10', 'aaa', 100, 1000),
('1.', '10', 'aab', 101, 1000),
('1.', '10', 'aac', 102, 1000)
)
SELECT
DISTINCT
concat(user_pseudo_id, ga_session_id) as id,
page_title,
NTH_VALUE(page_title, 1) OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) as first,
NTH_VALUE(page_title, 2) OVER (PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp) as second
FROM RAW_EVENTS_PIVOT
;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论