使用分组时获取最大值和最小值的唯一标识符。

huangapple go评论63阅读模式
英文:

Get max and min value's unique id while using group by

问题

我有一个包含列:unique_id, product, root_location, price 的表,其中有超过 5000 万条记录。

我希望结果是 product, min_price, min_price_unique_id, max_price, max_price_unique_id

我的查询:

select product
    , min(price) as min_price
    , max(price) as max_price
from mytable
group by product

如何获取最低价和最高价的唯一标识?

英文:

I have a table with columns: unique_id, product, root_location, price with more than 50 million records

I want the result to be product, min_price, min_price_unique_id, max_price, max_price_unique_id

My query:

select product
    , min(price) as min_price
    , max(price) as max_price
from mytable
group by product

How to get the unique id's of min and max price?

答案1

得分: 2

您可以尝试使用以下代码,结合RANKSTRING_AGG以进行条件聚合:

SELECT  product,
        MIN(price) AS min_price,
        STRING_AGG(CASE WHEN rn1 = 1 THEN unique_id END, ',') min_price_unique_id,
        MAX(price) AS max_price,
        STRING_AGG(CASE WHEN rn2 = 1 THEN unique_id END, ',') max_price_unique_id
FROM
(
  SELECT *,
    RANK() OVER (PARTITION BY product ORDER BY price) rn1,
    RANK() OVER (PARTITION BY product ORDER BY price DESC) rn2
  FROM tbl_name
) T
WHERE rn1 = 1 OR rn2 = 1
GROUP BY product

更新:
如果要获取唯一的donator_ids值,以防存在重复值,您可以使用另一个子查询/ CTE,并使用row_number函数按product和unique_id进行分区,然后仅获取row_number等于1的行:

WITH CTE1 AS
(
  SELECT *,
    RANK() OVER (PARTITION BY product ORDER BY price) rn1,
    RANK() OVER (PARTITION BY product ORDER BY price DESC) rn2
  FROM tbl_name
),
CTE2 AS
(
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY product, unique_id ORDER BY unique_id) row_num
  FROM CTE1
)
SELECT  product,
        MIN(price) AS min_price,
        STRING_AGG(CASE WHEN rn1 = 1 THEN unique_id END, ',') min_price_unique_id,
        MAX(price) AS max_price,
        STRING_AGG(CASE WHEN rn2 = 1 THEN unique_id END, ',') max_price_unique_id
FROM CTE2
WHERE (rn1 = 1 OR rn2 = 1) AND row_num = 1
GROUP BY product

演示链接

英文:

You could try using RANK and STRING_AGG with conditional aggregation as the following:

SELECT  product,
        MIN(price) AS min_price,
        STRING_AGG(CASE WHEN rn1 = 1 THEN unique_id END, ',') min_price_unique_id,
        MAX(price) AS max_price,
        STRING_AGG(CASE WHEN rn2 = 1 THEN unique_id END, ',') max_price_unique_id
FROM
(
  SELECT *,
    RANK() OVER (PARTITION BY product ORDER BY price) rn1,
    RANK() OVER (PARTITION BY product ORDER BY price DESC) rn2
  FROM tbl_name
) T
WHERE rn1 =1 OR rn2 =1
GROUP BY product

demo

Update:
To get the unique donator_ids values in case duplicates have existed, you could use another subquery/ CTE, and use the row_number function partitioned by product, unique_id then get only rows where row_number =1.

WITH CTE1 AS
(
  SELECT *,
    RANK() OVER (PARTITION BY product ORDER BY price) rn1,
    RANK() OVER (PARTITION BY product ORDER BY price DESC) rn2
  FROM tbl_name
),
CTE2 AS
(
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY product, unique_id ORDER BY unique_id) row_num
  FROM CTE1
)
SELECT  product,
        MIN(price) AS min_price,
        STRING_AGG(CASE WHEN rn1 = 1 THEN unique_id END, ',') min_price_unique_id,
        MAX(price) AS max_price,
        STRING_AGG(CASE WHEN rn2 = 1 THEN unique_id END, ',') max_price_unique_id
FROM CTE2
WHERE (rn1 =1 OR rn2 =1) AND  row_num =1
GROUP BY product

demo

答案2

得分: 1

选择 a.*, b.unique_id 作为 min_price_unique_id, c.unique_id 作为 max_price_unique_id
 (
  选择 产品
    , 最小(价格) 作为 min_price
    , 最大(价格) 作为 max_price
   我的表
  分组  产品
) a
连接 我的表 b
   a.min_price = b.价格
连接 我的表 c
   a.max_price = c.价格

此查询使用两个连接来从原始表 mytable 中获取最低和最高价格的唯一标识符。子查询计算每个产品的最大和最小价格,子查询 a 与原始表 mytable 两次连接。一次用于获取最低价格的唯一标识符,然后用于获取最高价格的唯一标识符。两个连接的连接条件都是价格列。

db<>fiddle

英文:
select a.*, b.unique_id as min_price_unique_id, c.unique_id as max_price_unique_id
from (
  select product
    , min(price) as min_price
    , max(price) as max_price
  from mytable
  group by product
) a
join mytable b
  on a.min_price = b.price
join mytable c
  on a.max_price = c.price

This query uses two joins to the original table mytable to get the unique ids of the minimum and maximum prices. 1. The subquery calculates the max and min price for each product and the subquery a is joined with the original table mytable twice. Once to get the unique id of the minimum price and then to get the unique id of the maximum price. The join condition for both joins is the price column.

db<>fiddle

答案3

得分: 1

这里是你要翻译的内容:

"Depending on your requirement, use row_number() or dense_rank(). Do check out the documentation for the details on these 2 window functions."

"根据您的需求,可以使用 row_number()dense_rank()。请查阅文档以获取有关这两个窗口函数的详细信息。"

"Below query will return One unique_id per product."

"下面的查询将返回每个 product 一个 unique_id。"

"if you do want to show all the corresponding unique_id for the min and max price, here is one way."

"如果您确实希望显示最小和最大价格的所有相应 unique_id,这是一种方法。"

"Use 2 separate query to identify the min and max price after that join using full outer join (as there might be different number of unique_id rows in the min or max price)"

"使用两个单独的查询来确定最小和最大价格,然后使用 full outer join 连接它们(因为最小价格和最大价格可能包含不同数量的 unique_id 行)"

"select mn.product, mn.min_price, mn.unique_id, mx.max_price, mx.unique_id"

"选择 mn.product, mn.min_price, mn.unique_id, mx.max_price, mx.unique_id"

"order by mn.product, mn.unique_id, mx.unique_id"

"按 mn.product、mn.unique_id 和 mx.unique_id 排序"

英文:

It really depends on what you want with the unique_id. In the situation where there are multiple unique_id for the min or max price, do you want to return all unique_id or just only one.

Depending on your requirement, use row_number() or dense_rank(). Do check out the documentation for the details on these 2 window functions.

Below query will return One unique_id per product.

select product,
       min_price           = max(case when min_rn = 1 then price end),
       min_price_unique_id = max(case when min_rn = 1 then unique_id end),
       max_price           = max(case when max_rn = 1 then price end),
       max_price_unique_id = max(case when max_rn = 1 then unique_id end)
from
(
    select *, 
           min_rn = row_number() over (partition by product order by price),
           max_rn = row_number() over (partition by product order by price desc)
    from   mytable
) t
group by product

Edit : if you do want to show all the corresponding unique_id for the min and max price, here is one way. Use 2 separate query to identify the min and max price after that join using full outer join (as there might be different number of unique_id rows in the min or max price)

with 
min_price as
(
    select  product, min_price = price, unique_id, 
            rn = row_number() over (partition by product order by unique_id)
    from
    (
        select *, 
               min_rn = dense_rank() over (partition by product order by price)
        from   mytable
    ) m
    where min_rn = 1
),
max_price as
(
    select  product, max_price = price, unique_id, 
            rn = row_number() over (partition by product order by unique_id)
    from
    (
        select *, 
               max_rn = dense_rank() over (partition by product order by price desc)
        from   mytable
    ) m
    where max_rn = 1
) 
select mn.product, mn.min_price, mn.unique_id, mx.max_price, mx.unique_id
from   min_price mn
       full outer join max_price mx on  mn.product = mx.product
                                    and mn.rn      = mx.rn
order by mn.product, mn.unique_id, mx.unique_id

huangapple
  • 本文由 发表于 2023年6月19日 12:35:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503630.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定