英文:
SAS finding overlapping dates and add a flag
问题
我有一个以ID为级别的数据集,其中有一些日期重叠。我只需要找到这些行并添加一个标识符来计算重叠记录的数量。
数据:
ID ITEM StrDate EndDate
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
以下是我期望的结果:
ID ITEM StrDate EndDate Indicator
1001 A121 02/01/2022 02/15/2022 N
1001 B121 03/10/2022 03/10/2022 N
1002 C121 02/01/2022 02/10/2022 Y
1002 D121 02/05/2022 02/15/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
我尝试首先按StrDate
和EndDate
对数据进行排序:
Proc sort data=Data; by ID StrDate EndDate;run;
然后我尝试使用lag
函数来找到相同的ID并计算日期之间的差值,但我发现这不是正确的做法。
我感谢你的帮助。谢谢!
英文:
I have a dataset at id level with some overlapping dates. All I need to find those rows and add an identifier to count the number overlapping records.
Data:
ID ITEM StrDate EndDate
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
Below is the Result that I am expecting
Want:
ID ITEM StrDate EndDate Indicator
1001 A121 02/01/2022 02/15/2022 N
1001 B121 03/10/2022 03/10/2022 N
1002 C121 02/01/2022 02/10/2022 Y
1002 D121 02/05/2022 02/15/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
I tried sorting the data first on StrDate
and EndDate
Proc sort data=Data; by ID StrDate EndDate;run;
Then I tried using lag
function to find the same id and subtract the dates but I figured that's not the correct way of doing.
I appreciate your help here. thanks
答案1
得分: 2
SAS日期值是可以用作跟踪数组索引的整数。这种技术称为直接索引搜索。
示例:
可以编写一个双重DOW解决方案来查找重叠记录。第一个循环标记正在使用的日期,第二个循环通过直接索引找到标志以评估重叠的范围。
data have;
input ID ITEM $ StrDate EndDate;
attrib strdate enddate format=mmddyy10. informat=mmddyy10.;
datalines;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
data want;
array tracker(100000) _temporary_ ;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
do _i_ = strdate to enddate;
tracker(_i_) + 1; /* 使用直接索引标记日期 */
end;
end;
do _n_ = 1 to _n_;
set have;
/* 没有重叠意味着在范围内没有日期会找到一个已设置的标志 */
/* 循环将以 _i_ > enddate 退出 */
do _i_ = strdate to enddate while (tracker(_i_) = 1);
end;
length overlap_indicator $1;
overlap_indicator = ifc (_i_ > enddate, 'N', 'Y');
output;
end;
call missing (of tracker(*));
drop _: ;
run;
英文:
SAS Date values are integers that can be used as an index into a tracking array. This technique is called a direct-index search.
Example:
A double DOW solution can be coded to find the overlapping records. The first loop flags dates in use and the second loop evaluates the range for an overlap by finding a flag via direct-index.
data have;
input ID ITEM $ StrDate EndDate;
attrib strdate enddate format=mmddyy10. informat=mmddyy10.;
datalines;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
data want;
array tracker(100000) _temporary_ ;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
do _i_ = strdate to enddate;
tracker(_i_) + 1; /* flag date using direct-index */
end;
end;
do _n_ = 1 to _n_;
set have;
/* no overlap would mean no dates in range would find a flag set */
/* and loop would exit with _i_ > enddate */
do _i_ = strdate to enddate while (tracker(_i_) = 1);
end;
length overlap_indicator $1;
overlap_indicator = ifc (_i_ > enddate, 'N', 'Y');
output;
end;
call missing (of tracker(*));
drop _: ;
run;
答案2
得分: 1
延伸、计数和重新合并,这是我的想法。
*在ID 1002中添加了额外的观察;
data have;
input ID $ ITEM $ StrDate mmddyy10. +1 EndDate mmddyy10.;
format StrDate EndDate mmddyy10.;
cards;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1002 D121 03/05/2022 03/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
run;
*延伸;
data middle;
set have;
do date=StrDate to EndDate;
output;
end;
run;
计数和重新合并;
proc sql noprint;
create table want as
select distinct a., ifc(b.count and a.StrDate<=b.date<=a.EndDate,'Y','N') as Indicator
from have as a
left join (
select id, date, count(date) as count from middle
group by id, date
having count>1
) as b on a.id=b.id
;
quit;
顺便说一下,如果不是所有日期重叠在一个ID的记录中,但您希望将它们全部标记出来,您需要通过删除a.StrDate<=b.date<=a.EndDate
来修改表查找条件。
英文:
Extend, count and remerge, this is my thought.
*An extra observation added to ID 1002;
data have;
input ID $ ITEM $ StrDate mmddyy10. +1 EndDate mmddyy10.;
format StrDate EndDate mmddyy10.;
cards;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1002 D121 03/05/2022 03/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
run;
*Extend;
data middle;
set have;
do date=StrDate to EndDate;
output;
end;
run;
*Count and remerge;
proc sql noprint;
create table want as
select distinct a.*, ifc(b.count and a.StrDate<=b.date<=a.EndDate,'Y','N') as Indicator
from have as a
left join (
select id, date, count(date) as count from middle
group by id, date
having count>1
) as b on a.id=b.id
;
quit;
By the way, if not all records overlapping in dates of one ID but you want to flag all of them out, you need to modify table lookup condition by removing the a.StrDate<=b.date<=a.EndDate
.
答案3
得分: 0
简单的重叠逻辑:
proc sql;
create table want as
select
a.*,
/* 简单重叠逻辑 */
case
when a.strdate <= b.strdate & a.enddate >= b.strdate then 'Y'
when b.strdate < a.strdate & b.enddate >= a.strdate then 'Y'
else 'N'
end as overlap
from
have a
left join
have b
on a.id = b.id /* 在相同的ID上连接 */
and a.item <> b.item /* 但物品不相同 */
;
quit;
结果:
ID ITEM StrDate EndDate overlap
1001 B121 03/10/2022 03/10/2022 N
1001 A121 02/01/2022 02/15/2022 N
1002 D121 02/05/2022 02/15/2022 Y
1002 C121 02/01/2022 02/10/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
重叠发生在 StartA <= StartB 时:
StartA EndA>=StartB
|-------------|
|---------
StartB
英文:
Simple overlap logic:
proc sql;
create table want as
select
a.*,
/* simple overlap logic */
case
when a.strdate <= b.strdate & a.enddate >= b.strdate then 'Y'
when b.strdate < a.strdate & b.enddate >= a.strdate then 'Y'
else 'N'
end as overlap
from
have a
left join
have b
on a.id = b.id /* join on same ids */
and a.item <> b.item /* but not the same item */
;
quit;
Result:
ID ITEM StrDate EndDate overlap
1001 B121 03/10/2022 03/10/2022 N
1001 A121 02/01/2022 02/15/2022 N
1002 D121 02/05/2022 02/15/2022 Y
1002 C121 02/01/2022 02/10/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
Overlap occurs if StartA <= StartB when:
StartA EndA>=StartB
|-------------|
|---------
StartB
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论