How to select records by condition in postgresql , and from this set select records with the same value of some field?

huangapple go评论51阅读模式
英文:

How to select records by condition in postgresql , and from this set select records with the same value of some field?

问题

这里有一个包含数百万条记录的表。

如下所示

select c.id, c.mobile_phone, c.last_name, c.email
from user c
group by c.id, c.mobile_phone, c.last_name, c.email
HAVING count(*) > 1;

需要几分钟。

需要选择具有相同的手机号码的记录,并从结果中选择具有相同的电子邮件的记录。

例如

我已经这样做了:

select * from (select * from user where mobile_phone = '00222334422222') ou
where (select count(*) from user inr
       where inr.email = ou.email) > 1;

以及

SELECT
    *,
    ROW_NUMBER() OVER (
        PARTITION BY email, mobile_phone
        ORDER BY email, mobile_phone) AS Row_Number
FROM (select * from user where mobile_phone = '2225776676788') as "c*";

例如

select * from employee where
        mobile_phone = '75777302722';

在图中显示的字段应该在最终数据样本中。

这是一个版本

select * from (select * from employee where
                                          mobile_phone = '75777302722') ou
where (select count(*) from employee inr
       where inr.email = ou.email) > 1;

这段代码在某些版本的PostgreSQL上不起作用。

PostgreSQL 14.7 - 它可以工作。

PostgreSQL 13.7 - 不起作用。

我得到了具有相同电话号码的行,但我无法仅选择那些电子邮件相同的行(同时,具有电子邮件 = null的行不应包括在内)。

有没有关于如何做到这一点的想法?

英文:

There is a table with million records.

like so

select c.id, c.mobile_phone, c.last_name, c.email
from user c
group by c.id, c.mobile_phone, c.last_name, c.email
HAVING count(*) > 1;

takes minutes.

It is necessary to select records with the same mobile_phone, and from the resulting selection, select records that have the same Email.

for example

I done that:

select * from (select * from user where mobile_phone = '00222334422222') ou
where (select count(*) from user inr
       where inr.email = ou.email) > 1;

and so:

SELECT
    *,
    ROW_NUMBER() OVER (
        PARTITION BY email, mobile_phone
        ORDER BY email, mobile_phone) AS Row_Number
FROM (select * from user where mobile_phone = '2225776676788') as "c*";

for example

select * from employee where
        mobile_phone = '75777302722';

How to select records by condition in postgresql , and from this set select records with the same value of some field?

the fields shown in Fig. should be in the final data sample.

It's a version

select * from (select * from employee where
                                          mobile_phone = '75777302722') ou
where (select count(*) from employee inr
       where inr.email = ou.email) > 1;

This code does not work on some versions of postgresql.

PostgreSQL 14.7 - it works.

PostgreSQL 13.7 - doen't work.

I get rows with the same phone number, but I can't select only those rows (from this dataset) where the email is the same (at the same time, rows that have Email = null should not be included)

5 users have the same phone number.
Of these 5 users - 2 have the same Email - they should be included in the final data set.

Are there any ideas how to do this?

答案1

得分: 1

在内部查询中,`row_number()` 函数为每个 `mobile_phone, email` 组合分配一个编号,然后外部查询选择具有多个匹配项的记录(请注意,不应包括具有 `rn = 3` 及更高的行,因为 `rn = 2` 已经提供了该组合)。

此解决方案不使用臭名昭著的 `count(*)` 函数,因为它通常很慢。如果性能是一个问题,您应该在 `(mobile_phone, email)` 上添加一个索引。任何具有超过几千条记录的表(或者更准确地说:具有超过几个物理数据存储页面的表)都将受益于针对典型查询进行调整的适当索引。
英文:
SELECT id, mobile_phone, last_name, email
FROM (
    SELECT *, row_number() OVER (PARTITION BY mobile_phone, email) AS rn
    FROM user ) user_rn
WHERE mobile_phone = '75777302722'
  AND rn = 2;

In the inner query the row_number() function assigns a number to each combination of mobile_phone, email and the outer query then selects those records with multiple hits (note that rows with rn = 3 and higher should not be included because rn = 2 already supplies the combination).

This solution does not use the count(*) function, which is notoriously slow. If performance is an issue, you should add an index on (mobile_phone, email). Any table with more than a few thousand records (or, more precisely: with more than a few physical pages for data storage) will benefit from appropriate indexes tuned to the typical queries.

huangapple
  • 本文由 发表于 2023年2月27日 01:39:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75573863.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定