用GROUP BY子句的减法查询产生不同的结果。

huangapple go评论52阅读模式
英文:

Minus query with group by clause giving different result

问题

I have checked various similar questions but could not find the answer.
我已经查看了各种类似的问题,但未能找到答案。

I have 2 same tables (table1 and table2) but it gets loaded at different times, once in the morning and the other at night.
我有两个相同的表(table1和table2),但它们在不同的时间加载,一个在早上,另一个在晚上。

I did a minus query on these:-
我对这些表执行了减法查询:-

select data1,data2,data3,
    file_dt from table1 minus select data1,data2,data3,
    file_dt from table2.

select data1,data2,data3,
    file_dt from table2 minus 
select data1,data2,data3,
    file_dt from table1

and got the 0 records which is the desired result.
并获得了0条记录,这是期望的结果。

But due to parallel loading, I need to fetch unique records from table1 and table2.
但由于并行加载,我需要从table1和table2中获取唯一的记录。

I tried both rowid and rownumber approach but still got records in the minus query.
我尝试了rowid和rownumber两种方法,但在减法查询中仍然得到了记录。

Approach1(rowid):-

select data1,data2,data3,
    file_dt from table1
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         data2,data3
                                   order by file_dt DESC) rn
                            from table1)
                   where rn = 1)
minus
select data1,data2,data3,
    file_dt from table2
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         data2,data3
                                   order by file_dt DESC) rn
                            from table2)
                   where rn = 1)

I get 2 records but expect 0 records.
我得到2条记录,但期望得到0条记录。

Approach2(row_number):-

select data1,data2,data3,
        file_dt,
       ROW_NUMBER() OVER(PARTITION BY data2,data3
                         ORDER BY file_dt desc) rn        
from table1)
WHERE rn = 1
minus
select data1,data2,data3,
        file_dt,
       ROW_NUMBER() OVER(PARTITION BY data2,data3
                         ORDER BY file_dt desc) rn        
from table2)
WHERE rn = 1

Again, I get some records.
再次,我得到了一些记录。

Any help is appreciated!
感谢任何帮助!

英文:

I have checked various similar questions but could not find the answer.
I have 2 same tables(table1 and table2) but it gets loaded at different time, once in the morning and other at the night.
I did minus query on these:-

select data1,data2,data3,
    file_dt from table1 minus select data1,data2,data3,
    file_dt from table2.

select data1,data2,data3,
    file_dt from table2 minus 
select data1,data2,data3,
    file_dt from table1

and get the 0 records which is desired results.
But due to parallel loading, I need to fetch unique records from table1 and table2.
I tried both rowid and rownumber approach still getting records in minus query.
Approach1(rowid):-

select data1,data2,data3,
    file_dt from table1
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         data2,data3
                                   order by file_dt DESC) rn
                            from table1)
                   where rn = 1)
minus
select data1,data2,data3,
    file_dt from table2
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         data2,data3
                                   order by file_dt DESC) rn
                            from table2)
                   where rn = 1)

I get 2 records but expect 0 records.
Approach2(row_number):-

select data1,data2,data3,
        file_dt,
       ROW_NUMBER() OVER(PARTITION BY data2,data3
                         ORDER BY file_dt desc) rn        
from table1)
WHERE rn = 1
minus
select data1,data2,data3,
        file_dt,
       ROW_NUMBER() OVER(PARTITION BY data2,data3
                         ORDER BY file_dt desc) rn        
from table2)
WHERE rn = 1

Again get some records.
Any help is appreciated!

答案1

得分: 1

以下是翻译好的部分:

> 我对这些执行了减法查询,并得到了所需的结果,即0条记录。

假设你执行了类似以下的操作:

select data1, data2, data3, file_dt from table1
minus
select data1, data2, data3, file_dt from table2

这意味着在table1中没有数据,这些数据也没有出现在table2中。

这并不意味着这两个表是相同的——table2中可能有table1中没有出现的数据。减法操作只是单向的 - 来自文档,“由第一个查询选择的所有不同行,但不包括第二个查询选择的行”。

如果table2中具有相同的data1/2/3组合但具有较晚日期的附加数据,那么基本的减法查询将无法看到它们。但是,当你将查询更改为基于以下内容时:

row_number() over (partition by data2, data3 order by file_dt DESC) as rn

那么两个子查询都只会获取data2/3组合的最新行(基于file_dt),但不包括data1,这看起来有点奇怪,但在这一点上并不重要。

这意味着第二个子查询将捕获新于第一个子查询中的记录。

为了演示一个简单的例子,如果你在table1中有以下数据:

DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 02-JAN-23
3 4 5 03-JAN-23

以及在table2中有以下数据:

DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 02-JAN-23
3 4 5 03-JAN-23
2 3 4 04-JAN-23
3 4 5 05-JAN-23

第一个表中的所有三行也出现在第二个表中,所以当你对它们执行减法时,不会返回任何数据。

如果你查看两个表的rn计算:

select data1, data2, data3, file_dt,
  row_number() over (partition by data2, data3 order by file_dt DESC) as rn
from table1
DATA1 DATA2 DATA3 FILE_DT RN
1 2 3 01-JAN-23 1
2 3 4 02-JAN-23 1
3 4 5 03-JAN-23 1
select data1, data2, data3, file_dt
from (
  select data1, data2, data3, file_dt,
    row_number() over (partition by data2, data3 order by file_dt DESC) as rn
  from table1
)
where rn = 1
DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 02-JAN-23
3 4 5 03-JAN-23
select data1, data2, data3, file_dt,
  row_number() over (partition by data2, data3 order by file_dt DESC) as rn
from table2
DATA1 DATA2 DATA3 FILE_DT RN
1 2 3 01-JAN-23 1
2 3 4 04-JAN-23 1
2 3 4 02-JAN-23 2
3 4 5 05-JAN-23 1
3 4 5 03-JAN-23 2
select data1, data2, data3, file_dt
from (
  select data1, data2, data3, file_dt,
    row_number() over (partition by data2, data3 order by file_dt DESC) as rn
  from table2
)
where rn = 1
DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 04-JAN-23
3 4 5 05-JAN-23

你可以看到在这两个子查询中,rn = 1的过滤返回了不同的行。然后,当你执行以下操作时:

| 1 | 2 | 3 | 01-JAN-23 |
| 2 | 3 | 4 | 02-JAN-23 |
| 3 | 4 | 5 | 03-JAN-23 |

减法

| 1 | 2 | 3 | 01-JAN-23 |
| 2 | 3 | 4 | 04-JAN-23 |
| 3 | 4 | 5 | 05-JAN-23 |

唯一的共同行现在是

| 1 | 2 | 3 | 01-JAN-23 |

所以只有那一行被minus操作消除,你的查询返回其他两行:

DATA1 DATA2 DATA3 FILE_DT
2 3 4 02-JAN-23
3 4 5 03-JAN-23

[fiddle](https://dbfiddle

英文:

> I did minus query on these and get the 0 records which is desired results

Assuming you did something like:

select data1, data2, data3, file_dt from table1
minus
select data1, data2, data3, file_dt from table2

that means there is no data in table1 which does not also appear in table2.

It does not mean the tables are identical - there could be data in table2 which does not appear in table1. The minus operation only goes one way - from the docs, "All distinct rows selected by the first query but not the second".

If table2 has additional data for the same data1/2/3 combinations but with later dates then those won't be seen by that basic minus query. But when you change the query to be based on

row_number() over (partition by data2,data3 order by file_dt DESC) as rn

then both subqueries only get the most recent row (based on file_dt) for the combination of data2/3 (but not data1, which looks odd but doesn't matter at this point).

That means the second subquery will pick up records that are newer than those in the first subquery.

To demonstrate with a simple example, if you have this data in table1:

DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 02-JAN-23
3 4 5 03-JAN-23

and this in table2:

DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 02-JAN-23
3 4 5 03-JAN-23
2 3 4 04-JAN-23
3 4 5 05-JAN-23

All three rows in the first table also appear in the second table, so when you minus them no data is returned.

If you look at the rn calculation for both tables:

select data1, data2, data3, file_dt,
  row_number() over (partition by data2,data3 order by file_dt DESC) as rn
from table1
DATA1 DATA2 DATA3 FILE_DT RN
1 2 3 01-JAN-23 1
2 3 4 02-JAN-23 1
3 4 5 03-JAN-23 1
select data1, data2, data3, file_dt
from (
  select data1, data2, data3, file_dt,
    row_number() over (partition by data2,data3 order by file_dt DESC) as rn
  from table1
)
where rn = 1
DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 02-JAN-23
3 4 5 03-JAN-23
select data1, data2, data3, file_dt,
  row_number() over (partition by data2,data3 order by file_dt DESC) as rn
from table2
DATA1 DATA2 DATA3 FILE_DT RN
1 2 3 01-JAN-23 1
2 3 4 04-JAN-23 1
2 3 4 02-JAN-23 2
3 4 5 05-JAN-23 1
3 4 5 03-JAN-23 2
select data1, data2, data3, file_dt
from (
  select data1, data2, data3, file_dt,
    row_number() over (partition by data2,data3 order by file_dt DESC) as rn
  from table2
)
where rn = 1
DATA1 DATA2 DATA3 FILE_DT
1 2 3 01-JAN-23
2 3 4 04-JAN-23
3 4 5 05-JAN-23

you can see that filtering on rn = 1 returns different rows for the two subqueries. When you effectively then do:

| 1 | 2 | 3 | 01-JAN-23 |
| 2 | 3 | 4 | 02-JAN-23 |
| 3 | 4 | 5 | 03-JAN-23 |

MINUS

| 1 | 2 | 3 | 01-JAN-23 |
| 2 | 3 | 4 | 04-JAN-23 |
| 3 | 4 | 5 | 05-JAN-23 |

the only common row is now

| 1 | 2 | 3 | 01-JAN-23 |

so only that row is elminated by the minus operation, and your query returns the other two:

DATA1 DATA2 DATA3 FILE_DT
2 3 4 02-JAN-23
3 4 5 03-JAN-23

fiddle

答案2

得分: 1

以下是您要翻译的内容:

这是我看待这个问题的方式:

您选择

从table1中选择data1、data2、data3、file_dt
减去
从table2中选择data1、data2、data3、file_dt

并且没有返回结果。然后您再次执行相同的操作,但只关注表中data2和data3的最新条目

从<table1中的最新data2/data3行>中选择data1、data2、data3、file_dt
减去
从table2中的最新data2/data3行中选择data1、data2、data3、file_dt

然后突然返回结果。

这只是意味着

  • table2包含了table1中的每一行(第一个查询)
  • 但table2还包含了额外的(更新的)行,这些行在table1中不存在。这导致table1中的最新行与table2中的最新行没有匹配项(第二个查询)

这看起来是正确的。您提到这两个表在不同时间获取数据,因此其中一个表包含更新的信息,如果只比较最新的信息,那么就会出现差异(至少日期时间会不同)。

嗯,现在我发布了我的答案,stackoverflow突然显示出两个更早发布的答案。所以我猜我的答案对这个问题不会有太多帮助。

英文:

This is how I see this:

You select

select data1, data2, data3, file_dt from table1
minus
select data1, data2, data3, file_dt from table2

and get no rows. Then you do the same, but only look at the latest entry per data2, data3 in the tables

select data1, data2, data3, file_dt from <latest data2/data3 rows in table1>
minus
select data1, data2, data3, file_dt from latest data2/data3 rows in table2>

and suddenly you get rows.

That simply means

  • that table2 contains every row that is in table1 (first query)
  • but table2 also contains additional (newer) rows that don't exist in table1. This leads to the newest row in table1 not having a match with the newest row in table2. (second query)

This looks correct. You say that the tables get their data at different times, so one table has newer information, and if you only compare the newest information, then you'll have differences (at least the datetime will be different).

Well, now that I posted my answer, stackoverflow suddenly shows two other answers posted much earlier. So I guess my answer won't add much to the case.

答案3

得分: 0

以下是翻译好的部分:

"不太清楚您的需求。您说“我需要从table1和table2中获取唯一记录”,但随后又说您期望得到0行。尝试获取唯一记录时,为什么会得到0行呢?

那么,暂且不考虑0行的问题,如果您需要从这些表中获取唯一记录,以下方法是否适合您?将两个表使用UNION ALL组合在一起,然后计算合并集合的ROW_NUMBER,最后选择您需要的行。

select data1,data2,data3,file_dt
  from (select data1,data2,data3,file_dt,
               ROW_NUMBER() OVER (PARTITION BY data2,data3 ORDER BY file_dt DESC) seq
          from (select data1,data2,data3,file_dt from table1
                union all
                select data1,data2,data3,file_dt from table2))
 where seq = 1
英文:

It's a bit unclear what you're after. You say "I need to fetch unique records from table1 and table2" but then say you expect 0 rows. Why would you get 0 rows when trying to get unique records?

Setting aside the 0-rows thing then, if your need is to get unique records from the tables, does this work for you? Combine the two tables with UNION ALL, then compute your ROW_NUMBER of the combined set, then select your desired row.

select data1,data2,data3,file_dt
  from (select data1,data2,data3,file_dt,
               ROW_NUMBER() OVER (PARTITION BY data2,data3 ORDER BY file_dt DESC) seq
          from (select data1,data2,data3,file_dt from table1
                union all
                select data1,data2,data3,file_dt from table2))
 where seq = 1

huangapple
  • 本文由 发表于 2023年4月19日 22:01:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055440.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定