英文:
SQL: Determining Favorite Color on A Specific Date?
问题
我使用Netezza SQL工作。
假设存在一个世界,人们有自己喜欢的颜色。我们假设他们永远拥有这种颜色 - 直到他们决定改变自己的颜色。如果这种情况发生,新的颜色现在就是他们的最爱,直到另行通知。
每个人在同一年内可以多次更改自己的颜色。每个人也可以在同一年内多次重新确认自己现有的最爱颜色。
我已经在以下表格("my_table")中表示了这些信息:
id color date
1 111 red 2005-01-01
2 111 blue 2005-05-05
3 111 red 2007-01-01
4 222 red 2003-01-01
5 222 blue 2003-01-03
6 333 red 2004-01-01
7 333 red 2004-05-05
8 333 blue 2009-01-01
9 444 red 1999-01-01
10 444 blue 2010-01-01
11 555 blue 2020-01-01
12 666 blue 2000-01-01
13 666 red 2004-01-01
14 777 blue 2004-01-01
15 777 red 2006-01-01
使用这个表格,我们可以推断出以下信息:
- 在2005年,id=111改变了他们的颜色一次,然后在2007年再次改变了颜色 - 从那时起,他们的最爱颜色就是红色,直到另行通知。
- 在2003年,id=222改变了他们的颜色 - 他们的最爱颜色是蓝色,直到另行通知。
- 在2005年,id=444的最爱颜色是红色。
- id=555直到2020年才出现在表格中。
我的问题:
- 假设我们在2005-01-01之前选择每个ID最近的颜色。
- 接下来,我们只选择在2005-01-01之前喜欢红色的ID。
- 这些ID在2005-01-01和2015-01-01之间改变了多少次颜色?
最终答案看起来像这样:
id color_changes
1 111 2
2 333 1
3 444 1
4 666 0
我到目前为止尝试了什么:
我试图通过编写以下查询来解决这个问题,以找出每个人截至到2015-01-01的最爱颜色:
WITH CTE AS (
SELECT id, color, date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM my_table
WHERE date <= '2015-01-01'
)
SELECT id, color
FROM CTE
WHERE rn = 1;
但根据结果 - 我认为这不正确(例如,id=333在2005-01-01有一个最爱颜色是红色...但在SQL查询中,它显示为蓝色):
id color
1 111 red
2 222 blue
3 333 blue
4 444 blue
5 555 blue
6 666 red
7 777 red
我现在卡住了,无法达到期望的结果。请有人可以告诉我如何纠正这个问题吗?
谢谢!
英文:
I am working with Netezza SQL.
Suppose there is a universe where people have a favorite color. We assume that they have this color forever - until they decide to change their color. If this happens, this new color is now their favorite color until further notice.
Each person can change their color multiple times in the same year. Each person can re-confirm their existing favorite color in the same year multiple times as well.
I have represented this information in the following table ("my_table"):
id color date
1 111 red 2005-01-01
2 111 blue 2005-05-05
3 111 red 2007-01-01
4 222 red 2003-01-01
5 222 blue 2003-01-03
6 333 red 2004-01-01
7 333 red 2004-05-05
8 333 blue 2009-01-01
9 444 red 1999-01-01
10 444 blue 2010-01-01
11 555 blue 2020-01-01
12 666 blue 2000-01-01
13 666 red 2004-01-01
14 777 blue 2004-01-01
15 777 red 2006-01-01
Using this table, we can infer information such as:
- In 2005, id=111 changed their color once and then again in 2007 - and from that point on, their favorite color is red until further notice
- In 2003, id=222 changed their color - and their favorite color is blue until further notice
- In 2005, the favorite color for id = 444 would be red
- id = 555 did not appear in the table until 2020
My Question:
- Suppose we take the most recent color for each ID as of 2005-01-01
- Next, we only select ID's that had a favorite color of Red as of 2005-01-01
- How many times did these ID's change their color between 2005-01-01 and 2015-01-01?
The final answer will look something like this:
id color_changes
1 111 2
2 333 1
3 444 1
4 666 0
What I tried so far:
I tried to start this problem by writing the following query to find out the favorite color of each person as of 2015-01-01:
WITH CTE AS (
SELECT id, color, date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM my_table
WHERE date <= '2015-01-01'
)
SELECT id, color
FROM CTE
WHERE rn = 1;
But based on the results - I don't think this is correct (e.g. id=333 has a favorite color of red in 2005-01-01 ... but in the SQL query, it is coming as blue):
id color
1 111 red
2 222 blue
3 333 blue
4 444 blue
5 555 blue
6 666 red
7 777 red
I am now stuck and can't achieve the desired result. Can someone please show me how to correct this?
Thanks!
答案1
得分: 2
在您的查询中,您正在查找2005-01-01日期的每个Id的颜色,但您正在使用2015-01-01日期(您似乎指出了两个日期,但您的结果是2005年的)。
这可能比起初看起来要复杂一些,但我认为以下内容将为您提供所需的结果。
首先,我对行进行编号,从最接近目标日期的颜色开始编号为1,选择在目标日期最普遍的颜色,并使用标志指示后续条目是否更改了颜色。
然后,在第二个公用表达式(CTE)中,基于颜色在目标日期上是所需的“红色”的累积总和,这会为所有“有效”行提供一个标志。
最后,只选择有效行,总结指示颜色已更改的标志。
with flags as (
select *,
Row_Number() over(partition by case when date <= '20050101' then id end order by date desc) rn,
first_value(color) over(partition by case when date <= '20050101' then id end order by date desc) ColAtDate,
case when Lag(color) over(partition by id order by date) <> color then 1 end changed
from t
), v as (
select *,
Sum(case when colatdate = 'red' and rn = 1 then 1 end) over(partition by id order by rn) valid
from flags
where date <= '20150101'
)
select id, Coalesce(Sum(case when rn > 1 then changed end), 0)
from v
where valid = 1
group by id
order by id;
您没有指定具体的产品,所以这应该适用于任何符合ISO标准的关系型数据库管理系统。
请查看这个Fiddle演示。
英文:
In your query you are looking for the color per Id as at 2005-01-01 but you're using 2015-01-01 (You seem to indicate both dates but your results are for 2005)
This is possibly a little tricker than it first appears but I think the following will give you your desired result.
Initially I number the rows starting at 1 being the color closest to the target date, and choose the color that's prevalent at the target date, and also a flag to indicate if subsequent entries have a changed color.
Then in a second CTE a cumulative sum based on the color being the desired "red" at the target date, this gives a flag against all the "valid" rows.
Finally selecting only the valid row, sum the flags indicating the color has changed.
with flags as (
select *,
Row_Number() over(partition by case when date <= '20050101' then id end order by date desc) rn,
first_value(color) over(partition by case when date <= '20050101' then id end order by date desc) ColAtDate,
case when Lag(color) over(partition by id order by date) <> color then 1 end changed
from t
), v as (
select *,
Sum(case when colatdate = 'red' and rn = 1 then 1 end) over(partition by id order by rn) valid
from flags
where date <= '20150101'
)
select id, Coalesce(Sum(case when rn > 1 then changed end), 0)
from v
where valid = 1
group by id
order by id;
You haven't indicated a specific product so this should work in any iso-compliant RDBMS
See this Fiddle demo
答案2
得分: 1
你可以首先找到首次记录颜色为“red”的id
,然后通过count
函数找到聚合的颜色更改:
with cte as (
select c1.* from (select c.id, min(c.date) dt from colors c group by c.id) c1
join colors c2 on c2.id = c1.id and c2.date = c1.dt
where c2.color = 'red' and c1.id <= date('2005-01-01')
)
select c.id, count(distinct c.color) color_changes from colors c
where c.id in (select c2.id from cte c2) and c.date >= date('2005-01-01') and c.date <= date('2015-01-01')
group by c.id
union all
select c.id, 0 color_changes from colors c where c.date <= date('2005-01-01') and c.color = 'red' and c.id not in (select c2.id from cte c2)
英文:
You can first find the id
s whose first recorded color is red
, and then find the aggregated color changes via count
:
with cte as (
select c1.* from (select c.id, min(c.date) dt from colors c group by c.id) c1
join colors c2 on c2.id = c1.id and c2.date = c1.dt
where c2.color = 'red' and c1.id <= date('2005-01-01')
)
select c.id, count(distinct c.color) color_changes from colors c
where c.id in (select c2.id from cte c2) and c.date >= date('2005-01-01') and c.date <= date('2015-01-01')
group by c.id
union all
select c.id, 0 color_changes from colors c where c.date <= date('2005-01-01') and c.color = 'red' and c.id not in (select c2.id from cte c2)
答案3
得分: 1
这是您想要的翻译:
所以你想要求两个集合的交集;1. 2005-01-01 上是红色的 ID 集合,与 2. 在 2005 年到 2015 年之间的过渡次数的 ID 集合。
首先是 2005 年的红色 ID,
选择 id 从我的表 x
其中日期 = (选择日期最大值从我的表 y
其中 x.id=y.id 和日期 <= 日期 '2005-01-01')
并且颜色='红色'
这是一个标准的相关子查询。
过渡计数有点棘手,因为您的数据包含一些不是过渡的记录(从 '红色' 到 '红色'),必须忽略。您可以通过对记录进行排序并使用 Lag 窗口函数来比较先前记录的相同颜色与不同颜色。
选择 id, count(*) 作为颜色更改
从 (
选择 id, 日期,
情况当颜色=Lag(颜色)
在 (按日期排序的 ID 分区)
然后 '忽略' else '计数' end as 过渡
从我的表中
)
其中日期在 '2005-01-01' 和 '2015-01-01' 之间
并且过渡='计数'
分组按 id
现在左外连接集合 1 和集合 2,你完成了。
选择 A.id, B.color_changes 从 (
选择 id 从我的表 x
其中日期 = (选择日期最大值从我的表 y
其中 x.id=y.id 和日期 <= 日期 '2005-01-01')
并且颜色='红色'
) A 左外连接 (
选择 id, count(*) 作为颜色更改
从 (
选择 id, 日期,
情况当颜色=Lag(颜色)
在 (按日期排序的 ID 分区)
然后 '忽略' else '计数' end as 过渡
从我的表中
)
其中日期在 '2005-01-01' 和 '2015-01-01' 之间
并且过渡='计数'
分组按 id
) B 在 A.id=B.id
按 A.id 排序
英文:
So you want to intersect two sets; 1. The set of ids that were red on 2005-01-01, with 2. The set of ids with their transition count between 2005 & 2015.
First the reds on 2005,
Select id From my_table x
Where date = (Select max(date) from my_table y
Where x.id=y.id and date <= DATE '2005-01-01')
And color='red'
That's a standard correlated subquery.
The transition count is a little trickier, because your data contains some non-transitions (from 'red' to 'red') that must be ignored. You do this by ordering the records and comparing the prior record for same vs. different color with the Lag window function.
Select id, count(*) as color_changes
From (
Select id, date,
Case When color=Lag(color)
Over (Partition By ID Order by date)
Then 'ignore' else 'count' end as transition
From my_table
)
Where date between '2005-01-01' and '2015-01-01'
And transition='count'
Group By id
Now Left Outer Join set 1 and set 2 and you're done.
Select A.id, B.color_changes From (
Select id From my_table x
Where date = (Select max(date) from my_table y
Where x.id=y.id and date <= DATE '2005-01-01')
And color='red'
) A Left Outer Join (
Select id, count(*) as color_changes
From (
Select id, date,
Case When color=Lag(color)
Over (Partition By ID Order by date)
Then 'ignore' else 'count' end as transition
From my_table
)
Where date between '2005-01-01' and '2015-01-01'
And transition='count'
Group By id
) B On A.id=B.id
Order by A.id
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论