英文:
SQL: Counting the Number of Times a "Type" of Row Appears
问题
以下是我的尝试,用于在Netezza中计算每种年份组合出现的次数:
WITH CTE AS (
SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
FROM MY_TABLE
),
CTE2 AS (
SELECT id, MAX(rn) AS max_rn
FROM CTE
GROUP BY id
),
CTE3 AS (
SELECT CTE2.id, CTE.year, CTE.rn, CTE2.max_rn
FROM CTE2
JOIN CTE ON CTE2.id = CTE.id
),
CTE4 AS (
SELECT id,
MAX(CASE WHEN rn = 1 THEN year END) ||
MAX(CASE WHEN rn = 2 THEN ',' || year END) ||
MAX(CASE WHEN rn = 3 THEN ',' || year END) ||
MAX(CASE WHEN rn = 4 THEN ',' || year END) ||
MAX(CASE WHEN rn = 5 THEN ',' || year END) ||
MAX(CASE WHEN rn = 6 THEN ',' || year END) ||
MAX(CASE WHEN rn = 7 THEN ',' || year END) ||
MAX(CASE WHEN rn = 8 THEN ',' || year END) ||
MAX(CASE WHEN rn = 9 THEN ',' || year END) ||
MAX(CASE WHEN rn = 10 THEN ',' || year END)
AS combination
FROM CTE3
GROUP BY id
)
SELECT combination, COUNT(*) AS freq
FROM CTE4
GROUP BY combination
ORDER BY freq DESC;
希望这次计算的结果是正确的。
英文:
I am working with Netezza SQL. I have the following table:
CREATE TABLE MY_TABLE (
id VARCHAR(50),
year VARCHAR(50)
);
INSERT INTO MY_TABLE (id, year)
VALUES ('123', '2010');
INSERT INTO MY_TABLE (id, year)
VALUES ('123', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('123', '2012');
INSERT INTO MY_TABLE (id, year)
VALUES ('124', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('124', '2012');
INSERT INTO MY_TABLE (id, year)
VALUES ('124', '2013');
INSERT INTO MY_TABLE (id, year)
VALUES ('125', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('125', '2012');
INSERT INTO MY_TABLE (id, year)
VALUES ('125', '2013');
INSERT INTO MY_TABLE (id, year)
VALUES ('127', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('127', '2012');
INSERT INTO MY_TABLE (id, year)
VALUES ('127', '2015');
INSERT INTO MY_TABLE (id, year)
VALUES ('126', '2019');
My Question: I am trying to find out the number of times each combination of years appears. The final result would look like this:
combination freq
1 2011,2012,2015 1
2 2011,2012,2013 2
3 2010, 2011, 2012 1
4 2019 1
Here is my attempt at doing this in Netezza:
WITH CTE AS (
SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
FROM MY_TABLE
),
CTE2 AS (
SELECT id, MAX(rn) AS max_rn
FROM CTE
GROUP BY id
),
CTE3 AS (
SELECT CTE2.id, CTE.year, CTE.rn, CTE2.max_rn
FROM CTE2
JOIN CTE ON CTE2.id = CTE.id
),
CTE4 AS (
SELECT id,
MAX(CASE WHEN rn = 1 THEN year END) ||
MAX(CASE WHEN rn = 2 THEN ',' || year END) ||
MAX(CASE WHEN rn = 3 THEN ',' || year END) ||
MAX(CASE WHEN rn = 4 THEN ',' || year END) ||
MAX(CASE WHEN rn = 5 THEN ',' || year END) ||
MAX(CASE WHEN rn = 6 THEN ',' || year END) ||
MAX(CASE WHEN rn = 7 THEN ',' || year END) ||
MAX(CASE WHEN rn = 8 THEN ',' || year END) ||
MAX(CASE WHEN rn = 9 THEN ',' || year END) ||
MAX(CASE WHEN rn = 10 THEN ',' || year END)
AS combination
FROM CTE3
GROUP BY id
)
SELECT combination, COUNT(*) AS freq
FROM CTE4
GROUP BY combination
ORDER BY freq DESC;
But the calculations are completely wrong:
combination freq
1 <NA> 94
2 2010,2012,2013,2014,2015,2016,2017,2018,2019,2020 1
3 2010,2011,2012,2013,2014,2015,2016,2017,2019,2020 1
4 2010,2011,2012,2013,2014,2016,2017,2018,2019,2020 1
5 2010,2011,2012,2014,2015,2016,2017,2018,2019,2020 1
6 2010,2011,2013,2014,2015,2016,2017,2018,2019,2020 1
7 2010,2011,2012,2013,2014,2015,2016,2017,2018,2019 1
Can someone please show me how to fix this?
Thanks!
-
Note 1: I am using rn = 10 to account for the min/max range of years (which in my case is 10). I only have years 2010, 2011, ... 2020 in my dataset. Based on this information - is it possible to write a manual solution for individual years using UNION ALL style statements?
-
Note 2: Unfortunately, there is no GROUP_CONCAT function in Netezza - otherwise the following code would have worked:
Here is a sample:
SELECT GROUP_CONCAT(year, ',' ORDER BY year) AS combination, COUNT(*) AS freq
FROM my_data
GROUP BY id
ORDER BY freq DESC;
- Note 3: Here is R code to accomplish a similar task:
Source: https://stackoverflow.com/questions/74664615/counting-number-of-unique-column-values-per-group
agg <- aggregate(year ~ id, my_table, paste, collapse = ", ")
final = as.data.frame(table(agg$year))
答案1
得分: 3
由于Netezza缺少LISTAGG、STRING_AGG和GROUP_CONCAT函数,因此这会稍微复杂一些。
阅读到Netezza软件基于PostgreSQL 7.2,我在9.6版本上创建了这个示例,这是dbfiddle.uk上最早可用的版本。
CREATE TABLE MY_TABLE (
id VARCHAR(50),
year VARCHAR(50)
);
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2010');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2013');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2013');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2015');
INSERT INTO MY_TABLE (id, year) VALUES ('126', '2019');
WITH CTE AS (
SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
FROM MY_TABLE
),
CTE3 AS (
SELECT id
, year
, rn
, MAX(rn) OVER (PARTITION BY id) AS MaxRN
FROM CTE
),
CTE4 AS (
SELECT id
, COALESCE(MAX(CASE WHEN rn = 1 THEN year END), '') ||
COALESCE(MAX(CASE WHEN rn = 2 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 3 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 4 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 5 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 6 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 7 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 8 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 9 THEN ',' || year END), '') ||
COALESCE(MAX(CASE WHEN rn = 10 THEN ',' || year END), '')
AS combination
FROM CTE3
GROUP BY id
)
SELECT combination
, COUNT(id) AS freq
FROM CTE4
GROUP BY combination
需要使用COALESCE()
函数,因为SELECT 'a' || NULL
会返回NULL。如果没有这个函数,结果将会是:
combination | freq |
---|---|
null | 5 |
英文:
Since Netezza lacks LISTAGG, STRING_AGG, and GROUP_CONCAT, this is a bit more challenging.
Reading that Netezza software was based on PostgreSQL 7.2, I created this against 9.6 -- the oldest available on dbfiddle.uk.
CREATE TABLE MY_TABLE (
id VARCHAR(50),
year VARCHAR(50)
);
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2010');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2013');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2013');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2015');
INSERT INTO MY_TABLE (id, year) VALUES ('126', '2019');
WITH CTE AS (
SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
FROM MY_TABLE
),
CTE3 AS (
select id
, year
, rn
, max(rn) over (partition by id) as MaxRN
from cte
),
CTE4 AS (
SELECT id
, coalesce(max(CASE WHEN rn = 1 THEN year END), '') ||
coalesce(max(CASE WHEN rn = 2 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 3 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 4 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 5 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 6 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 7 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 8 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 9 THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 10 THEN ',' || year END), '')
AS combination
FROM CTE3
GROUP BY id
)
SELECT combination
, count(id) as freq
FROM CTE4
GROUP BY combination
COALESCE()
is needed because SELECT 'a' || NULL
returns NULL. Without this the result would be
combination | freq |
---|---|
null | 5 |
答案2
得分: 1
我不使用Netezza,也没有任何访问Netezza fiddle/playground以测试这个的途径。只是根据文档来看,似乎这样做会起作用:
with yeargroups (years) as
(
SELECT LISTAGG(cast(year as varchar(400)), ', ') WITHIN GROUP(ORDER BY year)
FROM My_Table
GROUP BY id
)
select years as Combination, count(*) as Frequency
from yeargroups
group by years;
PS:varchar(400)是为了适应100年(100*4)的情况,根据Netezza文档。
英文:
(Would be a mess in comments)
I don't use Netezza, nor have any access to a Netezza fiddle\playground to test this. Just from documentation it looks like this would do:
with yeargroups (years) as
(
SELECT LISTAGG(cast(year as varchar(400)), ', ') WITHIN GROUP(ORDER BY year)
FROM My_Table
GROUP BY id
)
select years as Combination, count(*) as Frequency
from yeargrooups
group by years;
PS: varchar(400) is for accommodating to 100 years (100*4) as per Netezza doc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论