英文:
Athena looking for records with different start dates
问题
我尝试找出具有多个安装日期的计量表数量。具有重复信息的多行不少见。在尝试不同策略时,我得到了不同的答案,所以我肯定做错了什么。
我已经尝试了以下查询:
选择客户ID,服务点ID,辅助服务点ID
从客户
在 (
选择辅助服务点ID
从客户
按辅助服务点ID分组
有长度(辅助服务点ID) > 1 并且计数(不同的计量表安装日期) > 1
)
选择客户ID,服务点ID,辅助服务点ID,计量表安装日期
从客户
在 (
选择辅助服务点ID
从客户
按辅助服务点ID分组
有计数(不同的计量表安装日期) > 1
)
选择a.服务点ID,a.辅助服务点ID,a.计量表安装日期
从客户 a,客户 b
其中a.服务点ID = b.服务点ID
并且a.辅助服务点ID = b.辅助服务点ID
并且a.计量表安装日期 != b.计量表安装日期
按a.服务点ID,a.辅助服务点ID,a.计量表安装日期分组
我期望的结果是:
客户 | 服务点 | 计量表ID | 计量表安装日期 |
---|---|---|---|
1 | A1 | AM1 | 20201005 |
1 | A1 | AM1 | 20150101 |
1 | A3 | AM3 | 20200509 |
1 | A3 | AM3 | 20221013 |
我认为我没有处理服务点有多个计量表且其中一个计量表有多个启动日期的情况。感谢您的帮助!
英文:
I have a lot of customer files with I customer data that includes a customer id which can have multiple service points. A service point can have a meter and a meter can have a meter install date:
Cust | Service Point | Meter ID | Meter Install Date |
---|---|---|---|
1 | A1 | AM1 | 20201005 |
1 | A1 | AM1 | 20201005 |
1 | A1 | AM1 | 20201005 |
1 | A1 | AM1 | 20150101 |
1 | A1 | AM1 | 20150101 |
1 | A1 | AM1 | 20150101 |
1 | A2 | AM2 | 20220110 |
1 | A2 | AM2 | 20220110 |
1 | A2 | AM2 | 20220110 |
1 | A2 | AM21 | 20230215 |
1 | A3 | AM3 | 20200509 |
1 | A3 | AM3 | 20200509 |
1 | A3 | AM3 | 20200509 |
1 | A3 | AM3 | 20221013 |
I'm trying to find the number of meters that have a multiple install dates. It is not uncommon to have multiple rows where these field's information is duplicated. As I try different strategies I get different answers so I'm doing something wrong.
I've tried:
select customer_id, service_point_id, secondary_sp_id
from customer
where secondary_sp_id in (
select secondary_sp_id
from customer
group by secondary_sp_id
having length(secondary_sp_id) > 1 and count(distinct meter_install_date) > 1
select customer_id, service_point_id, secondary_sp_id, meter_install_date
from customer
where secondary_sp_id in (
select secondary_sp_id
from customer
group by secondary_sp_id having count(distinct meter_install_date) > 1 )
select a.service_point_id, a.secondary_sp_id, a.meter_install_date
from customer a, customer b
where a.service_point_id = b.service_point_id
and a.secondary_sp_id = b.secondary_sp_id
and a.meter_install_date != b.meter_install_date
group by a.service_point_id, a.secondary_sp_id, a.meter_install_date
I would expect to get back:
Cust | Service Point | Meter ID | Meter Install Date |
---|---|---|---|
1 | A1 | AM1 | 20201005 |
1 | A1 | AM1 | 20150101 |
1 | A3 | AM3 | 20200509 |
1 | A3 | AM3 | 20221013 |
I don't think I'm handling when a service point has multiple meters and one of those meters has multiple start dates. Thanks for your help!
答案1
得分: 2
I'm not sure we have enough information of your data or schema, such as how "secondardy_sp_id" fits into this. No details were provided on that column nor the prod_peco_customer table.
If we assume your data appears like your first formatted section in the question, then the following CTE would work as-is.
create table customer (
cust integer,
service_point varchar(5),
meter_id varchar(5),
meter_install_date date
);
insert into customer values
(1, 'A1', 'AM1', '20201005'),
(1, 'A1', 'AM1', '20150101'),
(1, 'A2', 'AM2', '20230110');
with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id;
cust | service_point | meter_id | meter_install_date |
---|---|---|---|
1 | A1 | AM1 | 2020-10-05T00:00:00.000Z |
1 | A1 | AM1 | 2015-01-01T00:00:00.000Z |
But I kinda doubt your data looks like this even though you formatted it that way in the question. Adjust accordingly, but main point is that you could use a sub-query or CTE for identifying your meters with multiple install dates.
----------Update-----------
Based on the updated sample data, then you would simply need to change select c.*
to select distinct c.*
such as this...
with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select distinct c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id
order by 1,2,3,4
cust | service_point | meter_id | meter_install_date |
---|---|---|---|
1 | A1 | AM1 | 2015-01-01T00:00:00.000Z |
1 | A1 | AM1 | 2020-10-05T00:00:00.000Z |
1 | A3 | AM3 | 2020-05-09T00:00:00.000Z |
1 | A3 | AM3 | 2022-10-13T00:00:00.000Z |
英文:
I'm not sure we have enough information of your data or schema, such as how "secondardy_sp_id" fits into this. No details were provided on that column nor the prod_peco_customer table.
If we assume your data appears like your first formatted section in the question, then the following CTE would work as-is.
create table customer (
cust integer,
service_point varchar(5),
meter_id varchar(5),
meter_install_date date
);
insert into customer values
(1, 'A1', 'AM1', '20201005'),
(1, 'A1', 'AM1', '20150101'),
(1, 'A2', 'AM2', '20230110');
with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id;
cust | service_point | meter_id | meter_install_date |
---|---|---|---|
1 | A1 | AM1 | 2020-10-05T00:00:00.000Z |
1 | A1 | AM1 | 2015-01-01T00:00:00.000Z |
But I kinda doubt your data looks like this even though you formatted it that way in the question. Adjust accordingly, but main point is that you could use a sub-query or CTE for identifying your meters with multiple install dates.
----------Update-----------
Based on the updated sample data, then you would simply need to change select c.*
to select distinct c.*
such as this...
with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select distinct c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id
order by 1,2,3,4
cust | service_point | meter_id | meter_install_date |
---|---|---|---|
1 | A1 | AM1 | 2015-01-01T00:00:00.000Z |
1 | A1 | AM1 | 2020-10-05T00:00:00.000Z |
1 | A3 | AM3 | 2020-05-09T00:00:00.000Z |
1 | A3 | AM3 | 2022-10-13T00:00:00.000Z |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论