英文:
How to select records by condition in postgresql , and from this set select records with the same value of some field?
问题
这里有一个包含数百万条记录的表。
如下所示
select c.id, c.mobile_phone, c.last_name, c.email
from user c
group by c.id, c.mobile_phone, c.last_name, c.email
HAVING count(*) > 1;
需要几分钟。
需要选择具有相同的手机号码的记录,并从结果中选择具有相同的电子邮件的记录。
例如
我已经这样做了:
select * from (select * from user where mobile_phone = '00222334422222') ou
where (select count(*) from user inr
where inr.email = ou.email) > 1;
以及
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email, mobile_phone
ORDER BY email, mobile_phone) AS Row_Number
FROM (select * from user where mobile_phone = '2225776676788') as "c*";
例如
select * from employee where
mobile_phone = '75777302722';
在图中显示的字段应该在最终数据样本中。
这是一个版本
select * from (select * from employee where
mobile_phone = '75777302722') ou
where (select count(*) from employee inr
where inr.email = ou.email) > 1;
这段代码在某些版本的PostgreSQL上不起作用。
PostgreSQL 14.7 - 它可以工作。
PostgreSQL 13.7 - 不起作用。
我得到了具有相同电话号码的行,但我无法仅选择那些电子邮件相同的行(同时,具有电子邮件 = null的行不应包括在内)。
有没有关于如何做到这一点的想法?
英文:
There is a table with million records.
like so
select c.id, c.mobile_phone, c.last_name, c.email
from user c
group by c.id, c.mobile_phone, c.last_name, c.email
HAVING count(*) > 1;
takes minutes.
It is necessary to select records with the same mobile_phone, and from the resulting selection, select records that have the same Email.
for example
I done that:
select * from (select * from user where mobile_phone = '00222334422222') ou
where (select count(*) from user inr
where inr.email = ou.email) > 1;
and so:
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email, mobile_phone
ORDER BY email, mobile_phone) AS Row_Number
FROM (select * from user where mobile_phone = '2225776676788') as "c*";
for example
select * from employee where
mobile_phone = '75777302722';
the fields shown in Fig. should be in the final data sample.
It's a version
select * from (select * from employee where
mobile_phone = '75777302722') ou
where (select count(*) from employee inr
where inr.email = ou.email) > 1;
This code does not work on some versions of postgresql.
PostgreSQL 14.7 - it works.
PostgreSQL 13.7 - doen't work.
I get rows with the same phone number, but I can't select only those rows (from this dataset) where the email is the same (at the same time, rows that have Email = null should not be included)
5 users have the same phone number.
Of these 5 users - 2 have the same Email - they should be included in the final data set.
Are there any ideas how to do this?
答案1
得分: 1
在内部查询中,`row_number()` 函数为每个 `mobile_phone, email` 组合分配一个编号,然后外部查询选择具有多个匹配项的记录(请注意,不应包括具有 `rn = 3` 及更高的行,因为 `rn = 2` 已经提供了该组合)。
此解决方案不使用臭名昭著的 `count(*)` 函数,因为它通常很慢。如果性能是一个问题,您应该在 `(mobile_phone, email)` 上添加一个索引。任何具有超过几千条记录的表(或者更准确地说:具有超过几个物理数据存储页面的表)都将受益于针对典型查询进行调整的适当索引。
英文:
SELECT id, mobile_phone, last_name, email
FROM (
SELECT *, row_number() OVER (PARTITION BY mobile_phone, email) AS rn
FROM user ) user_rn
WHERE mobile_phone = '75777302722'
AND rn = 2;
In the inner query the row_number()
function assigns a number to each combination of mobile_phone, email
and the outer query then selects those records with multiple hits (note that rows with rn = 3
and higher should not be included because rn = 2
already supplies the combination).
This solution does not use the count(*)
function, which is notoriously slow. If performance is an issue, you should add an index on (mobile_phone, email)
. Any table with more than a few thousand records (or, more precisely: with more than a few physical pages for data storage) will benefit from appropriate indexes tuned to the typical queries.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论