SQL:计算某种类型的行出现的次数

huangapple go评论56阅读模式
英文:

SQL: Counting the Number of Times a "Type" of Row Appears

问题

以下是我的尝试,用于在Netezza中计算每种年份组合出现的次数:

WITH CTE AS (
    SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
    FROM MY_TABLE
),
CTE2 AS (
    SELECT id, MAX(rn) AS max_rn
    FROM CTE
    GROUP BY id
),
CTE3 AS (
    SELECT CTE2.id, CTE.year, CTE.rn, CTE2.max_rn
    FROM CTE2
    JOIN CTE ON CTE2.id = CTE.id
),
CTE4 AS (
    SELECT id,
        MAX(CASE WHEN rn = 1 THEN year END) ||
        MAX(CASE WHEN rn = 2 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 3 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 4 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 5 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 6 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 7 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 8 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 9 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 10 THEN ',' || year END)
        AS combination
    FROM CTE3
    GROUP BY id
)
SELECT combination, COUNT(*) AS freq
FROM CTE4
GROUP BY combination
ORDER BY freq DESC;

希望这次计算的结果是正确的。

英文:

I am working with Netezza SQL. I have the following table:

CREATE TABLE MY_TABLE (
    id VARCHAR(50),
    year VARCHAR(50)
);

INSERT INTO MY_TABLE (id, year)
VALUES ('123', '2010');
INSERT INTO MY_TABLE (id, year)
VALUES ('123', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('123', '2012'); 

INSERT INTO MY_TABLE (id, year)
VALUES ('124', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('124', '2012');
INSERT INTO MY_TABLE (id, year)
VALUES ('124', '2013');


INSERT INTO MY_TABLE (id, year)
VALUES ('125', '2011');
INSERT INTO MY_TABLE (id, year)
VALUES ('125', '2012');
INSERT INTO MY_TABLE (id, year)
VALUES ('125', '2013');

  INSERT INTO MY_TABLE (id, year)
    VALUES ('127', '2011');
    INSERT INTO MY_TABLE (id, year)
    VALUES ('127', '2012');
    INSERT INTO MY_TABLE (id, year)
    VALUES ('127', '2015');

  INSERT INTO MY_TABLE (id, year)
    VALUES ('126', '2019');

My Question: I am trying to find out the number of times each combination of years appears. The final result would look like this:

       combination freq
1   2011,2012,2015    1
2   2011,2012,2013    2
3 2010, 2011, 2012    1
4             2019    1

Here is my attempt at doing this in Netezza:

WITH CTE AS (
    SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
    FROM MY_TABLE
),
CTE2 AS (
    SELECT id, MAX(rn) AS max_rn
    FROM CTE
    GROUP BY id
),
CTE3 AS (
    SELECT CTE2.id, CTE.year, CTE.rn, CTE2.max_rn
    FROM CTE2
    JOIN CTE ON CTE2.id = CTE.id
),
CTE4 AS (
    SELECT id,
        MAX(CASE WHEN rn = 1 THEN year END) ||
        MAX(CASE WHEN rn = 2 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 3 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 4 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 5 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 6 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 7 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 8 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 9 THEN ',' || year END) ||
        MAX(CASE WHEN rn = 10 THEN ',' || year END)
        AS combination
    FROM CTE3
    GROUP BY id
)
SELECT combination, COUNT(*) AS freq
FROM CTE4
GROUP BY combination
ORDER BY freq DESC;

But the calculations are completely wrong:

                                       combination freq
1                                              <NA>   94
2 2010,2012,2013,2014,2015,2016,2017,2018,2019,2020    1
3 2010,2011,2012,2013,2014,2015,2016,2017,2019,2020    1
4 2010,2011,2012,2013,2014,2016,2017,2018,2019,2020    1
5 2010,2011,2012,2014,2015,2016,2017,2018,2019,2020    1
6 2010,2011,2013,2014,2015,2016,2017,2018,2019,2020    1
7 2010,2011,2012,2013,2014,2015,2016,2017,2018,2019    1

Can someone please show me how to fix this?

Thanks!

  • Note 1: I am using rn = 10 to account for the min/max range of years (which in my case is 10). I only have years 2010, 2011, ... 2020 in my dataset. Based on this information - is it possible to write a manual solution for individual years using UNION ALL style statements?

  • Note 2: Unfortunately, there is no GROUP_CONCAT function in Netezza - otherwise the following code would have worked:

Here is a sample:

  SELECT GROUP_CONCAT(year, ',' ORDER BY year) AS combination, COUNT(*) AS freq
    FROM my_data
    GROUP BY id
    ORDER BY freq DESC;
  • Note 3: Here is R code to accomplish a similar task:

Source: https://stackoverflow.com/questions/74664615/counting-number-of-unique-column-values-per-group

agg <- aggregate(year ~ id, my_table, paste, collapse = ", ")
final = as.data.frame(table(agg$year))

答案1

得分: 3

由于Netezza缺少LISTAGG、STRING_AGG和GROUP_CONCAT函数,因此这会稍微复杂一些。

阅读到Netezza软件基于PostgreSQL 7.2,我在9.6版本上创建了这个示例,这是dbfiddle.uk上最早可用的版本。

CREATE TABLE MY_TABLE (
    id VARCHAR(50),
    year VARCHAR(50)
);

INSERT INTO MY_TABLE (id, year) VALUES ('123', '2010');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2012'); 

INSERT INTO MY_TABLE (id, year) VALUES ('124', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2013');

INSERT INTO MY_TABLE (id, year) VALUES ('125', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2013');

INSERT INTO MY_TABLE (id, year) VALUES ('127', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2015');

INSERT INTO MY_TABLE (id, year) VALUES ('126', '2019');

WITH CTE AS (
  SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
  FROM MY_TABLE
),
CTE3 AS (
  SELECT id
  , year
  , rn
  , MAX(rn) OVER (PARTITION BY id) AS MaxRN
  FROM CTE
),
CTE4 AS (  
  SELECT id
  , COALESCE(MAX(CASE WHEN rn = 1 THEN year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 2  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 3  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 4  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 5  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 6  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 7  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 8  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 9  THEN ',' || year END), '') ||
    COALESCE(MAX(CASE WHEN rn = 10 THEN ',' || year END), '')
    AS combination
  FROM CTE3
  GROUP BY id
)

SELECT combination
, COUNT(id) AS freq
FROM CTE4
GROUP BY combination

需要使用COALESCE()函数,因为SELECT 'a' || NULL会返回NULL。如果没有这个函数,结果将会是:

combination freq
null 5
英文:

Since Netezza lacks LISTAGG, STRING_AGG, and GROUP_CONCAT, this is a bit more challenging.

Reading that Netezza software was based on PostgreSQL 7.2, I created this against 9.6 -- the oldest available on dbfiddle.uk.

CREATE TABLE MY_TABLE (
id VARCHAR(50),
year VARCHAR(50)
);
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2010');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('123', '2012'); 
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('124', '2013');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('125', '2013');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2011');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2012');
INSERT INTO MY_TABLE (id, year) VALUES ('127', '2015');
INSERT INTO MY_TABLE (id, year) VALUES ('126', '2019');
WITH CTE AS (
SELECT id, year, ROW_NUMBER() OVER (PARTITION BY id ORDER BY year) AS rn
FROM MY_TABLE
),
CTE3 AS (
select id
, year
, rn
, max(rn) over (partition by id) as MaxRN
from cte
),
CTE4 AS (  
SELECT id
, coalesce(max(CASE WHEN rn = 1 THEN year END), '') ||
coalesce(max(CASE WHEN rn = 2  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 3  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 4  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 5  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 6  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 7  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 8  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 9  THEN ',' || year END), '') ||
coalesce(max(CASE WHEN rn = 10 THEN ',' || year END), '')
AS combination
FROM CTE3
GROUP BY id
)
SELECT combination
, count(id) as freq
FROM CTE4
GROUP BY combination

COALESCE() is needed because SELECT 'a' || NULL returns NULL. Without this the result would be

combination freq
null 5

答案2

得分: 1

我不使用Netezza,也没有任何访问Netezza fiddle/playground以测试这个的途径。只是根据文档来看,似乎这样做会起作用:

with yeargroups (years) as 
(
SELECT LISTAGG(cast(year as varchar(400)), ', ') WITHIN GROUP(ORDER BY year)
      FROM My_Table 
      GROUP BY id
)
select years as Combination, count(*) as Frequency
from yeargroups
group by years;

PS:varchar(400)是为了适应100年(100*4)的情况,根据Netezza文档。

英文:

(Would be a mess in comments)
I don't use Netezza, nor have any access to a Netezza fiddle\playground to test this. Just from documentation it looks like this would do:

with yeargroups (years) as 
(
SELECT LISTAGG(cast(year as varchar(400)), ', ') WITHIN GROUP(ORDER BY year)
FROM My_Table 
GROUP BY id
)
select years as Combination, count(*) as Frequency
from yeargrooups
group by years;

PS: varchar(400) is for accommodating to 100 years (100*4) as per Netezza doc.

huangapple
  • 本文由 发表于 2023年7月7日 02:27:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631612.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定