Athena 寻找具有不同开始日期的记录。

huangapple go评论61阅读模式
英文:

Athena looking for records with different start dates

问题

我尝试找出具有多个安装日期的计量表数量。具有重复信息的多行不少见。在尝试不同策略时,我得到了不同的答案,所以我肯定做错了什么。

我已经尝试了以下查询:

选择客户ID,服务点ID,辅助服务点ID
从客户
 (
  选择辅助服务点ID
  从客户
  按辅助服务点ID分组
  有长度(辅助服务点ID) > 1 并且计数(不同的计量表安装日期) > 1
)
选择客户ID,服务点ID,辅助服务点ID,计量表安装日期
从客户
 (
  选择辅助服务点ID
  从客户
  按辅助服务点ID分组
  有计数(不同的计量表安装日期) > 1
)
选择a.服务点IDa.辅助服务点IDa.计量表安装日期
从客户 a,客户 b
其中a.服务点ID = b.服务点ID
并且a.辅助服务点ID = b.辅助服务点ID
并且a.计量表安装日期 != b.计量表安装日期
a.服务点IDa.辅助服务点IDa.计量表安装日期分组

我期望的结果是:

客户 服务点 计量表ID 计量表安装日期
1 A1 AM1 20201005
1 A1 AM1 20150101
1 A3 AM3 20200509
1 A3 AM3 20221013

我认为我没有处理服务点有多个计量表且其中一个计量表有多个启动日期的情况。感谢您的帮助!

英文:

I have a lot of customer files with I customer data that includes a customer id which can have multiple service points. A service point can have a meter and a meter can have a meter install date:

Cust Service Point Meter ID Meter Install Date
1 A1 AM1 20201005
1 A1 AM1 20201005
1 A1 AM1 20201005
1 A1 AM1 20150101
1 A1 AM1 20150101
1 A1 AM1 20150101
1 A2 AM2 20220110
1 A2 AM2 20220110
1 A2 AM2 20220110
1 A2 AM21 20230215
1 A3 AM3 20200509
1 A3 AM3 20200509
1 A3 AM3 20200509
1 A3 AM3 20221013

I'm trying to find the number of meters that have a multiple install dates. It is not uncommon to have multiple rows where these field's information is duplicated. As I try different strategies I get different answers so I'm doing something wrong.

I've tried:


select customer_id, service_point_id, secondary_sp_id
from customer
where secondary_sp_id in (
  select secondary_sp_id
  from customer
  group by secondary_sp_id
  having length(secondary_sp_id) > 1 and count(distinct meter_install_date) > 1

select customer_id, service_point_id, secondary_sp_id, meter_install_date
from customer
where secondary_sp_id in (
select secondary_sp_id
from customer
group by secondary_sp_id having count(distinct meter_install_date) > 1 )

select a.service_point_id, a.secondary_sp_id, a.meter_install_date 
from customer a, customer b 
where a.service_point_id = b.service_point_id 
and a.secondary_sp_id = b.secondary_sp_id 
and a.meter_install_date != b.meter_install_date 
group by a.service_point_id, a.secondary_sp_id, a.meter_install_date

I would expect to get back:

Cust Service Point Meter ID Meter Install Date
1 A1 AM1 20201005
1 A1 AM1 20150101
1 A3 AM3 20200509
1 A3 AM3 20221013

I don't think I'm handling when a service point has multiple meters and one of those meters has multiple start dates. Thanks for your help!

答案1

得分: 2

I'm not sure we have enough information of your data or schema, such as how "secondardy_sp_id" fits into this. No details were provided on that column nor the prod_peco_customer table.

If we assume your data appears like your first formatted section in the question, then the following CTE would work as-is.

create table customer (
cust integer,
service_point varchar(5),
meter_id varchar(5),
meter_install_date date
);

insert into customer values
(1, 'A1', 'AM1', '20201005'),
(1, 'A1', 'AM1', '20150101'),
(1, 'A2', 'AM2', '20230110');


with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id;

cust service_point meter_id meter_install_date
1 A1 AM1 2020-10-05T00:00:00.000Z
1 A1 AM1 2015-01-01T00:00:00.000Z

But I kinda doubt your data looks like this even though you formatted it that way in the question. Adjust accordingly, but main point is that you could use a sub-query or CTE for identifying your meters with multiple install dates.

----------Update-----------

Based on the updated sample data, then you would simply need to change select c.* to select distinct c.* such as this...

with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select distinct c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id
order by 1,2,3,4

cust service_point meter_id meter_install_date
1 A1 AM1 2015-01-01T00:00:00.000Z
1 A1 AM1 2020-10-05T00:00:00.000Z
1 A3 AM3 2020-05-09T00:00:00.000Z
1 A3 AM3 2022-10-13T00:00:00.000Z
英文:

I'm not sure we have enough information of your data or schema, such as how "secondardy_sp_id" fits into this. No details were provided on that column nor the prod_peco_customer table.

If we assume your data appears like your first formatted section in the question, then the following CTE would work as-is.

create table customer (
  cust integer, 
  service_point varchar(5), 
  meter_id varchar(5), 
  meter_install_date date
  );
  
insert into customer values 
(1, 'A1', 'AM1', '20201005'),
(1, 'A1', 'AM1', '20150101'),
(1, 'A2', 'AM2', '20230110');

with target_meters as (
  select meter_id
  from customer
  group by meter_id
  having count(distinct meter_install_date) > 1
  )
select c.*
from customer c
join target_meters t
  on c.meter_id = t.meter_id;
cust service_point meter_id meter_install_date
1 A1 AM1 2020-10-05T00:00:00.000Z
1 A1 AM1 2015-01-01T00:00:00.000Z

But I kinda doubt your data looks like this even though you formatted it that way in the question. Adjust accordingly, but main point is that you could use a sub-query or CTE for identifying your meters with multiple install dates.

----------Update-----------

Based on the updated sample data, then you would simply need to change select c.* to select distinct c.* such as this...

with target_meters as (
  select meter_id
  from customer
  group by meter_id
  having count(distinct meter_install_date) > 1
  )
select distinct c.*
from customer c
join target_meters t
  on c.meter_id = t.meter_id
order by 1,2,3,4
cust service_point meter_id meter_install_date
1 A1 AM1 2015-01-01T00:00:00.000Z
1 A1 AM1 2020-10-05T00:00:00.000Z
1 A3 AM3 2020-05-09T00:00:00.000Z
1 A3 AM3 2022-10-13T00:00:00.000Z

huangapple
  • 本文由 发表于 2023年2月16日 04:54:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465342.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定