英文:
Getting duplicate records in result set even after distinct keyword
问题
以下是您提供的内容的翻译:
我有一个如下的表格。
entity_id|effective_date|value|
A |2023-09-09 |234 |
A |2023-09-06 |345 |
B |2023-09-02 |341 |
C |2023-09-01 |347 |
我想找到具有最大effective date及其相应value的所有唯一entity ID。我尝试以下查询。
select distinct entity_id, value, max(effective_date) start_date
from refdata.investment_raw ir
where attribute_id = 232
and entity_id in (select invest.val as investment_id
from refdata.ved soi
inner join refdata.ved invest
on soi.entity_id = invest.entity_id
and current_date between invest.start_date and invest.end_date
and invest.attribute_code = 'IssuerId'
and soi.attribute_code = 'SO'
and soi.val in ('1','2')
and current_date between soi.start_date and soi.end_date)
group by entity_id, value
通过这个查询,我得到了以下结果集。
entity_id|effective_date|value|
A |2023-09-09 |234 |
A |2023-09-06 |345 |
B |2023-09-02 |341 |
C |2023-09-01 |347 |
我的期望结果集是
entity_id|effective_date|value|
A |2023-09-09 |234 |
B |2023-09-02 |341 |
C |2023-09-01 |347 |
在期望的结果中,您可以看到最大的effective date以及唯一的entity ID和它们相应的value。在实际的结果集中,我得到了entity ID A 作为重复记录。当我从查询中删除distinct值和group by子句时,我可以得到期望的结果,但没有value列。我希望获得相应的唯一entity ID值,而不重复。我的查询有什么问题吗?
英文:
I am having a table as follows.
entity_id|effective_date|value|
A |2023-09-09 |234 |
A |2023-09-06 |345 |
B |2023-09-02 |341 |
C |2023-09-01 |347 |
I want to find all unique entity IDs with the maximum effective date and their respective value. I am trying the following query.
select distinct entity_id, value, max(effective_date) start_date
from refdata.investment_raw ir
where attribute_id = 232
and entity_id in (select invest.val as investment_id
from refdata.ved soi
inner join refdata.ved invest
on soi.entity_id = invest.entity_id
and current_date between invest.start_date and invest.end_date
and invest.attribute_code = 'IssuerId'
and soi.attribute_code = 'SO'
and soi.val in ('1','2')
and current_date between soi.start_date and soi.end_date)
group by entity_id, value
With this I am getting following in the result set.
entity_id|effective_date|value|
A |2023-09-09 |234 |
A |2023-09-06 |345 |
B |2023-09-02 |341 |
C |2023-09-01 |347 |
My expected result set is
entity_id|effective_date|value|
A |2023-09-09 |234 |
B |2023-09-02 |341 |
C |2023-09-01 |347 |
In the expected, you can see the maximum effective date and unique entity IDs with their respective values. In the actual result set, I am getting entity ID A as duplicate record. When I remove the distinct value from the query and group by clause, I get my expected result but without the value column. I want the respective distinct entity ID value as well without duplication. What is wrong with my query?
答案1
得分: 0
以您的表格数据(ENTITY_ID,EFFECTIVE_DATE,VALUE)为基础
(
选择 'A','2023-09-09'::DATE,234 UNION ALL
选择 'A','2023-09-06'::DATE,345 UNION ALL
选择 'B','2023-09-02'::DATE,341 UNION ALL
选择 'C','2023-09-01'::DATE,347
)
选择 SQ.ENTITY_ID,SQ.EFFECTIVE_DATE,SQ.VALUE
从
(
选择 C.ENTITY_ID,C.EFFECTIVE_DATE,C.VALUE,
ROW_NUMBER()OVER(PARTITION BY C.ENTITY_ID ORDER BY C.EFFECTIVE_DATE DESC)AS XCOL
从 YOUR_TABLE_DATA AS C
)SQ WHERE SQ.XCOL=1
英文:
WITH YOUR_TABLE_DATA(ENTITY_ID,EFFECTIVE_DATE,VALUE) AS
(
SELECT 'A','2023-09-09'::DATE,234 UNION ALL
SELECT 'A','2023-09-06'::DATE,345 UNION ALL
SELECT 'B','2023-09-02'::DATE,341 UNION ALL
SELECT 'C','2023-09-01'::DATE,347
)
SELECT SQ.ENTITY_ID,SQ.EFFECTIVE_DATE,SQ.VALUE
FROM
(
SELECT C.ENTITY_ID,C.EFFECTIVE_DATE,C.VALUE,
ROW_NUMBER()OVER(PARTITION BY C.ENTITY_ID ORDER BY C.EFFECTIVE_DATE DESC)AS XCOL
FROM YOUR_TABLE_DATA AS C
)SQ WHERE SQ.XCOL=1
You can use ROW_NUMBER-filtering approach
答案2
得分: 0
以下是您要翻译的内容:
We produce first a list of entities with their max effective date using group by
, then we join the table with this list :
首先,我们使用 group by
创建了一个包含实体及其最大有效日期的列表,然后我们将该列表与表格进行连接:
select t.*
from mytable t
inner join (
select entity_id, max(effective_date) as max_effective_date
from mytable
group by entity_id
) as s on s.entity_id = t.entity_id and s.max_effective_date = t.effective_date
英文:
We produce first a list of entities with their max effective date using group by
, then we join the table with this list :
select t.*
from mytable t
inner join (
select entity_id, max(effective_date) as max_effective_date
from mytable
group by entity_id
) as s on s.entity_id = t.entity_id and s.max_effective_date = t.effective_date
答案3
得分: 0
看起来你想要使用特定于 PostgreSQL 的 DISTINCT ON,这与 DISTINCT 不是相同的事情。
select distinct on (entity_id) entity_id, value, effective_date as start_date
from ...
ORDER BY entity_id, effective_date desc
英文:
It seems like you want the PostgreSQL-specific DISTINCT ON, which is not the same thing as DISTINCT.
select distinct on (entity_id) entity_id, value, effective_date as start_date
from ...
ORDER BY entity_id, effective_date desc
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论