使用Min和Max来删除重复行,并处理空值。

huangapple go评论70阅读模式
英文:

Using Min and Max to remove duplicate rows, and how to handle null values

问题

以下是已翻译的部分:

我正在创建一个Oracle中的SQL查询,尝试在一个`case`语句中使用最小和最大的聚合函数来去除重复行。以下是当前状态的代码:

select 
   Student_number,
   case 
   when min(sr.racecd) = max(sr.racecd) then min(sr.racecd) else 'Two or more races' 
   end as races

这是输出的样子:

学生编号     种族
4322        两种或更多种族
4324        白人

当运行此代码时,它将多个行合并为一个,并将名称更改为'两种或更多种族'。但是,我遇到的问题是,当遇到空值时,它也将其更改为'两种或更多种族'。如何保留空值不变,或将它们更改为未知?另外,当我将其他列添加到查询中时,聚合函数的工作方式与仅查询学生编号和racecd时不同,为什么呢?

英文:

I am creating an SQL query in Oracle and I am trying to remove duplicate rows with a min and max aggregate function in a case statement. Here is the code at its current state:

select 
   Student_number,
       case 
       when min(sr.racecd) = max(sr.racecd) then min(sr.racecd) else 'Two or more races' 
   end as races               

this is what the output looks like

Student Number   Race
4322             two or more races
4324             White

When I run the code it combines multiple rows into one and changes the name to 'two or more races'. But, the problem I am having is when it runs into a null value it changes it to 'Two or more races', too. How can I keep the Nulls as is, or change them to unknown? Also, when I add other columns in to the query the aggregate function does not work the same as when I am querying only studentnumber and racecd, why is that?

答案1

得分: 0

因为NULL不等于任何值,所以在相等性测试中失败,它会进入THEN子句。有许多解决方法。其中一个是使用COUNT而不是忽略NULL,就像这样:

select 
   Student_number,
       CASE WHEN (COUNT(DISTINCT sr.raced) > 1) THEN 'Two or more races'
            ELSE MAX(sr.racecd)
       END
   end as races   

至于添加列的问题,当您添加未聚合的列时,您将被迫将它们包括在GROUP BY中。这将改变您查询的粒度以及每个组中包括的行,因此会影响结果。如果您想获取关于学生的更多信息,除了student_number之外,您需要对其他列进行聚合(例如使用MAX())。

英文:

Because NULL does not equal anything, so failing the equality test it goes to the THEN clause. There are a number of solutions. One is to use COUNT instead which ignores NULLs, like this:

select 
   Student_number,
       CASE WHEN (COUNT(DISTINCT sr.raced) > 1) THEN 'Two or more races'
            ELSE MAX(sr.racecd)
       END
   end as races   

As far as the issue with adding columns, when you add columns that are not aggregated you will be forced to include them in your GROUP BY. That changes the granularity of your query and the rows included in each group, so it will impact the result. If you want more information about student besides student_number, you'll want to aggregate other columns (e.g. with MAX() ).

huangapple
  • 本文由 发表于 2023年2月24日 07:46:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75551403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定