英文:
SQL: Counting the Number of Distinct Schools a Student Attended
问题
以下是翻译好的部分:
我的问题: 我想要找出:
- 对于任何在2010年上了“学校A”并且GPA大于5的学生
- 在2010年至2015年之间,这些学生上了多少所不同的学校?
这是我尝试为这个问题编写的查询 - 我首先确定了在2010年至2015年之间的所有行的子集,然后我“标记”了在2010年上了“学校A”并且GPA大于5的学生。最后,我使用了一个连接将所有这些内容合并在一起 - 并且使用了两层计数聚合来获得最终答案:
with cte_a as
(select * from my_table where school_year >= 2010 and school_year <= 2015),
cte_b as
(select distinct student_id from cte_a
where school = 'School A' and gpa > 5 and year = 2010)
select count_distinct_schools, count(student_id) from(
select t.student_id, count(distinct school) as count_distinct_schools
from my_table t
join cte_b
on t.student_id = cte_b.student_id
group by t.student_id)a
group by count_distinct_schools;
在最后一部分代码中,您应该使用cte_b
,因为您已经通过cte_b
子查询标记了符合条件的学生。这将确保您获得符合您问题要求的正确答案。
希望这可以帮助您正确执行查询,并获得您期望的结果格式。
英文:
I am working with Netezza SQL.
I have the following table (my_table) of student GPA over different years and what school they attended:
student_id year gpa school_year
1 1 2010 6 School A
2 1 2011 4 School B
3 1 2012 4 School A
4 2 2010 5 School A
5 2 2011 1 School B
6 2 2012 2 School B
7 3 2009 5 School A
8 3 2010 6 School A
9 3 2011 5 School A
10 4 2010 5 School A
11 4 2011 7 School B
12 4 2014 6 School C
13 5 2010 4 School C
14 5 2011 2 School A
15 5 2012 2 School A
My question: I want to find out:
- For any student that attended "School A" in 2010 and had a GPA > 5
- Between the years 2010-2015, how many distinct schools did these students attend?
Here is my attempt to write a query for this problem - I first identified a subset of all rows between 2010 and 2015, then I "tagged" students who attended School A in 2010 and had a GPA > 5. Finally, I used a join to bring all this together - and a two layered count aggregation to get the final answer:
with cte_a as
(select * from my_table where school_year >= 2010 and school_year<=2015),
cte_b as
(select distinct student_id from cte_a
where school = 'School A' and gpa>5 and school_year = 2010)
select count_1, count(student_id) from(
select t.student_id, count(distinct school) as count_1
from my_table t
join cte_b
on t.student_id = cte_b.student_id
group by t.student_id)a
group by count_1;
I am confused if in the last chunk of code, do I need to use cte_b
or my_table
.
Can someone please show me how to do this correctly? In the end, I am expecting the final answer in this format:
count_distinct_schools count(student_id)
2 1
1 1
Thanks!
答案1
得分: 2
不确定为什么需要双重汇总,但这样应该可以工作:
SELECT t.student_id, COUNT(DISTINCT school)
FROM my_table t
JOIN (
SELECT DISTINCT student_id
FROM my_table
WHERE school = '学校A' AND gpa > 5 AND year = 2010
) s ON s.student_id = t.student_id
WHERE t.year BETWEEN 2010 AND 2015
GROUP BY t.student_id
英文:
I am not sure why you need double aggregation, but shouldn't this work?
SELECT t.student_id, COUNT(DISTINCT school)
FROM my_table t
JOIN (
SELECT DISTINCT student_id
FROM my_table
WHERE school = 'School A' AND gpa > 5 AND year = 2010
) s ON s.student_id = t.student_id
WHERE t.year BETWEEN 2010 AND 2015
GROUP BY t.student_id
答案2
得分: 2
起初,我们找到了在2010年就读于“学校A”且GPA大于5的学生,使用CTE_Students
。
然后在主表中找到所有具有找到的学生ID的行,并对它们进行分组并计算学校数量。
WITH
CTE_Students
AS
(
SELECT student_id
FROM my_table
WHERE
school = 'School A'
AND gpa > 5
AND school_year = 2010
)
SELECT
my_table.student_id
,COUNT(DISTINCT school) AS SchoolCount
FROM
my_table
WHERE
my_table.student_id IN (SELECT CTE_Students.student_id FROM CTE_Students)
AND my_table.year >= 2010
AND my_table.year <= 2015
GROUP BY
my_table.student_id
;
如果您使用IN
而不是JOIN
,那么在第一个CTE中不需要获取学生ID的DISTINCT列表。
实际上,这并不重要。即使您将my_table
与CTE_Students
连接,并且CTE_Students
具有相同学生ID的多个行,最终学校的计数是DISTINCT的,因此所有可能的重复项都将被消除。
因此,在查找在2010年就读于“学校A”且GPA大于5的学生时,无需使用DISTINCT,在COUNT()
中的DISTINCT会处理它。
英文:
At first we find students that attended "School A" in 2010 and had a GPA > 5 using CTE_Students
.
Then in the main table find all rows with the found student IDs and group them and count the schools.
WITH
CTE_Students
AS
(
SELECT student_id
FROM my_table
WHERE
school = 'School A'
and gpa > 5
and school_year = 2010
)
SELECT
my_table.student_id
,COUNT(DISTINCT school) AS SchoolCount
FROM
my_table
WHERE
my_table.student_id IN (SELECT CTE_Students.student_id FROM CTE_Students)
AND my_table.year >= 2010
AND my_table.year <= 2015
GROUP BY
my_table.student_id
;
If you use IN
instead of JOIN
, then it is not necessary to get DISTINCT list of student IDs in the first CTE.
Actually, it doesn't matter. Even if you join my_table
with the CTE_Students
and CTE_Students
has several rows with the same student ID, in the end the count of schools is DISTINCT, so all possible duplicates will be eliminated.
So, there is no need to have DISTINCT when finding students that attended "School A" in 2010 and had a GPA > 5, the DISTINCT in the COUNT()
will take care of it.
答案3
得分: 1
我认为这应该可以工作。
;WITH cte
AS (
SELECT *
FROM my_table s
WHERE s.school = 'school_a'
AND s.school_year = 2010
AND s.gpa > 5.0
)
SELECT s.student_id
,count(DISTINCT s.school) AS NumOfDistinctSchools
FROM my_table s
WHERE s.student_id IN (
SELECT t.student_id
FROM cte t
)
AND s.school_year <= 2015
AND s.school_year >= 2010
GROUP BY s.student_id
英文:
I think this should work.
;WITH cte
AS (
SELECT *
FROM my_table s
WHERE s.school = 'school_a'
AND s.school_year = 2010
AND s.gpa > 5.0
)
SELECT s.student_id
,count(DISTINCT s.school) AS NumOfDistinctSchools
FROM my_table s
WHERE s.student_id IN (
SELECT t.student_id
FROM cte t
)
AND s.school_year <= 2015
AND s.school_year >= 2010
GROUP BY s.student_id
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论