在使用”distinct”关键字后仍然获得重复记录的结果集。

huangapple go评论69阅读模式
英文:

Getting duplicate records in result set even after distinct keyword

问题

以下是您提供的内容的翻译:

我有一个如下的表格。

entity_id|effective_date|value|
A        |2023-09-09    |234  |
A        |2023-09-06    |345  |
B        |2023-09-02    |341  |
C        |2023-09-01    |347  |

我想找到具有最大effective date及其相应value的所有唯一entity ID。我尝试以下查询。

select distinct entity_id, value, max(effective_date) start_date
from refdata.investment_raw ir
where attribute_id  = 232
  and entity_id in (select invest.val as investment_id  
                    from refdata.ved soi
                    inner join refdata.ved invest
                      on soi.entity_id = invest.entity_id 
                     and current_date  between invest.start_date and invest.end_date
                     and invest.attribute_code = 'IssuerId'
                     and soi.attribute_code = 'SO'
                     and  soi.val in ('1','2')
                     and current_date between soi.start_date and soi.end_date)
group by entity_id, value 

通过这个查询,我得到了以下结果集。

entity_id|effective_date|value|
A        |2023-09-09    |234  |
A        |2023-09-06    |345  |
B        |2023-09-02    |341  |
C        |2023-09-01    |347  |

我的期望结果集是

entity_id|effective_date|value|
A        |2023-09-09    |234  |
B        |2023-09-02    |341  |
C        |2023-09-01    |347  |

在期望的结果中,您可以看到最大的effective date以及唯一的entity ID和它们相应的value。在实际的结果集中,我得到了entity ID A 作为重复记录。当我从查询中删除distinct值和group by子句时,我可以得到期望的结果,但没有value列。我希望获得相应的唯一entity ID值,而不重复。我的查询有什么问题吗?

英文:

I am having a table as follows.

entity_id|effective_date|value|
A        |2023-09-09    |234  |
A        |2023-09-06    |345  |
B        |2023-09-02    |341  |
C        |2023-09-01    |347  |

I want to find all unique entity IDs with the maximum effective date and their respective value. I am trying the following query.

select distinct entity_id, value, max(effective_date) start_date
from refdata.investment_raw ir
where attribute_id  = 232
  and entity_id in (select invest.val as investment_id  
                    from refdata.ved soi
                    inner join refdata.ved invest
                      on soi.entity_id = invest.entity_id 
                     and current_date  between invest.start_date and invest.end_date
                     and invest.attribute_code = 'IssuerId'
                     and soi.attribute_code = 'SO'
                     and  soi.val in ('1','2')
                     and current_date between soi.start_date and soi.end_date)
group by entity_id, value 

With this I am getting following in the result set.

    entity_id|effective_date|value|
    A        |2023-09-09    |234  |
    A        |2023-09-06    |345  |
    B        |2023-09-02    |341  |
    C        |2023-09-01    |347  |

My expected result set is

    entity_id|effective_date|value|
    A        |2023-09-09    |234  |
    B        |2023-09-02    |341  |
    C        |2023-09-01    |347  |

In the expected, you can see the maximum effective date and unique entity IDs with their respective values. In the actual result set, I am getting entity ID A as duplicate record. When I remove the distinct value from the query and group by clause, I get my expected result but without the value column. I want the respective distinct entity ID value as well without duplication. What is wrong with my query?

答案1

得分: 0

以您的表格数据(ENTITY_ID,EFFECTIVE_DATE,VALUE)为基础
(
   选择 'A','2023-09-09'::DATE,234 UNION ALL 
   选择 'A','2023-09-06'::DATE,345 UNION ALL
   选择 'B','2023-09-02'::DATE,341 UNION ALL 
   选择 'C','2023-09-01'::DATE,347
)
选择 SQ.ENTITY_ID,SQ.EFFECTIVE_DATE,SQ.VALUE
从
 (
    选择 C.ENTITY_ID,C.EFFECTIVE_DATE,C.VALUE,
     ROW_NUMBER()OVER(PARTITION BY C.ENTITY_ID ORDER BY C.EFFECTIVE_DATE DESC)AS XCOL
    从 YOUR_TABLE_DATA AS C
)SQ WHERE SQ.XCOL=1
英文:
WITH YOUR_TABLE_DATA(ENTITY_ID,EFFECTIVE_DATE,VALUE) AS
(
   SELECT 'A','2023-09-09'::DATE,234 UNION ALL 
   SELECT 'A','2023-09-06'::DATE,345 UNION ALL
   SELECT 'B','2023-09-02'::DATE,341 UNION ALL 
   SELECT 'C','2023-09-01'::DATE,347
)
SELECT SQ.ENTITY_ID,SQ.EFFECTIVE_DATE,SQ.VALUE
FROM
 (
    SELECT C.ENTITY_ID,C.EFFECTIVE_DATE,C.VALUE,
     ROW_NUMBER()OVER(PARTITION BY C.ENTITY_ID ORDER BY C.EFFECTIVE_DATE DESC)AS XCOL
    FROM YOUR_TABLE_DATA AS C
)SQ WHERE SQ.XCOL=1

You can use ROW_NUMBER-filtering approach

答案2

得分: 0

以下是您要翻译的内容:

We produce first a list of entities with their max effective date using group by, then we join the table with this list :

首先,我们使用 group by 创建了一个包含实体及其最大有效日期的列表,然后我们将该列表与表格进行连接:

select t.*
from mytable t
inner join (
  select entity_id, max(effective_date) as max_effective_date
  from mytable
  group by entity_id
) as s on s.entity_id = t.entity_id and s.max_effective_date = t.effective_date

演示在此

英文:

We produce first a list of entities with their max effective date using group by, then we join the table with this list :

select t.*
from mytable t
inner join (
  select entity_id, max(effective_date) as max_effective_date
  from mytable
  group by entity_id
) as s on s.entity_id = t.entity_id and s.max_effective_date = t.effective_date

Demo here

答案3

得分: 0

看起来你想要使用特定于 PostgreSQL 的 DISTINCT ON,这与 DISTINCT 不是相同的事情。

select distinct on (entity_id) entity_id, value, effective_date as start_date
from ...
ORDER BY entity_id, effective_date desc
英文:

It seems like you want the PostgreSQL-specific DISTINCT ON, which is not the same thing as DISTINCT.

select distinct on (entity_id) entity_id, value, effective_date as start_date
from ...
ORDER BY entity_id, effective_date desc 

huangapple
  • 本文由 发表于 2023年3月7日 21:09:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/75662403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定