SAS 寻找重叠日期并添加标志

huangapple go评论62阅读模式
英文:

SAS finding overlapping dates and add a flag

问题

我有一个以ID为级别的数据集,其中有一些日期重叠。我只需要找到这些行并添加一个标识符来计算重叠记录的数量。

数据:

ID     ITEM      StrDate             EndDate
1001   A121       02/01/2022        02/15/2022
1001   B121       03/10/2022        03/10/2022
1002   C121       02/01/2022        02/10/2022
1002   D121       02/05/2022        02/15/2022
1003   E121       03/10/2022        03/21/2022
1003   F121       03/12/2022        03/21/2022
1004   G121       01/12/2022        01/14/2022

以下是我期望的结果:

ID     ITEM      StrDate             EndDate             Indicator
1001   A121       02/01/2022        02/15/2022              N
1001   B121       03/10/2022        03/10/2022              N
1002   C121       02/01/2022        02/10/2022              Y
1002   D121       02/05/2022        02/15/2022              Y
1003   E121       03/10/2022        03/21/2022              Y
1003   F121       03/12/2022        03/21/2022              Y
1004   G121       01/12/2022        01/14/2022              N

我尝试首先按StrDateEndDate对数据进行排序:

Proc sort data=Data; by ID StrDate EndDate;run;

然后我尝试使用lag函数来找到相同的ID并计算日期之间的差值,但我发现这不是正确的做法。

我感谢你的帮助。谢谢!

英文:

I have a dataset at id level with some overlapping dates. All I need to find those rows and add an identifier to count the number overlapping records.

   Data:
          ID     ITEM      StrDate             EndDate
          1001   A121       02/01/2022        02/15/2022
          1001   B121       03/10/2022        03/10/2022
          1002   C121       02/01/2022        02/10/2022
          1002   D121       02/05/2022        02/15/2022
          1003   E121       03/10/2022        03/21/2022
          1003   F121       03/12/2022        03/21/2022
          1004   G121       01/12/2022        01/14/2022

Below is the Result that I am expecting

   Want:
          ID     ITEM      StrDate             EndDate             Indicator
          1001   A121       02/01/2022        02/15/2022              N
          1001   B121       03/10/2022        03/10/2022              N
          1002   C121       02/01/2022        02/10/2022              Y
          1002   D121       02/05/2022        02/15/2022              Y
          1003   E121       03/10/2022        03/21/2022              Y
          1003   F121       03/12/2022        03/21/2022              Y
          1004   G121       01/12/2022        01/14/2022              N

I tried sorting the data first on StrDate and EndDate

            Proc sort data=Data; by ID StrDate EndDate;run;

Then I tried using lag function to find the same id and subtract the dates but I figured that's not the correct way of doing.

I appreciate your help here. thanks

答案1

得分: 2

SAS日期值是可以用作跟踪数组索引的整数。这种技术称为直接索引搜索。

示例:

可以编写一个双重DOW解决方案来查找重叠记录。第一个循环标记正在使用的日期,第二个循环通过直接索引找到标志以评估重叠的范围。

data have;
input ID ITEM $ StrDate EndDate;
attrib strdate enddate format=mmddyy10. informat=mmddyy10.;
datalines;
 1001 A121 02/01/2022 02/15/2022
 1001 B121 03/10/2022 03/10/2022
 1002 C121 02/01/2022 02/10/2022
 1002 D121 02/05/2022 02/15/2022
 1003 E121 03/10/2022 03/21/2022
 1003 F121 03/12/2022 03/21/2022
 1004 G121 01/12/2022 01/14/2022
;

data want;
  array tracker(100000) _temporary_ ;

  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;
    do _i_ = strdate to enddate;
      tracker(_i_) + 1;              /* 使用直接索引标记日期 */
    end;
  end;

  do _n_ = 1 to _n_;
    set have;

    /* 没有重叠意味着在范围内没有日期会找到一个已设置的标志 */
    /* 循环将以 _i_ > enddate 退出 */
    do _i_ = strdate to enddate while (tracker(_i_) = 1);
    end;

    length overlap_indicator $1; 
    overlap_indicator = ifc (_i_ > enddate, 'N', 'Y');
    output;
  end;

  call missing (of tracker(*));
  drop _: ;
run;
英文:

SAS Date values are integers that can be used as an index into a tracking array. This technique is called a direct-index search.

Example:

A double DOW solution can be coded to find the overlapping records. The first loop flags dates in use and the second loop evaluates the range for an overlap by finding a flag via direct-index.

data have;
input ID ITEM $ StrDate EndDate;
attrib strdate enddate format=mmddyy10. informat=mmddyy10.;
datalines;
 1001 A121 02/01/2022 02/15/2022
 1001 B121 03/10/2022 03/10/2022
 1002 C121 02/01/2022 02/10/2022
 1002 D121 02/05/2022 02/15/2022
 1003 E121 03/10/2022 03/21/2022
 1003 F121 03/12/2022 03/21/2022
 1004 G121 01/12/2022 01/14/2022
;

data want;
  array tracker(100000) _temporary_ ;

  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;
    do _i_ = strdate to enddate;
      tracker(_i_) + 1;              /* flag date using direct-index */
    end;
  end;

  do _n_ = 1 to _n_;
    set have;

    /* no overlap would mean no dates in range would find a flag set */
    /* and loop would exit with _i_ > enddate */
    do _i_ = strdate to enddate while (tracker(_i_) = 1);
    end;

    length overlap_indicator $1; 
    overlap_indicator = ifc (_i_ > enddate, 'N', 'Y');
    output;
  end;

  call missing (of tracker(*));
  drop _: ;
run;

SAS 寻找重叠日期并添加标志

答案2

得分: 1

延伸、计数和重新合并,这是我的想法。

*在ID 1002中添加了额外的观察;
data have;
input ID $ ITEM $ StrDate mmddyy10. +1 EndDate mmddyy10.;
format StrDate EndDate mmddyy10.;
cards;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1002 D121 03/05/2022 03/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
run;

*延伸;
data middle;
set have;
do date=StrDate to EndDate;
output;
end;
run;

计数和重新合并;
proc sql noprint;
create table want as
select distinct a.
, ifc(b.count and a.StrDate<=b.date<=a.EndDate,'Y','N') as Indicator
from have as a
left join (
select id, date, count(date) as count from middle
group by id, date
having count>1
) as b on a.id=b.id
;
quit;

顺便说一下,如果不是所有日期重叠在一个ID的记录中,但您希望将它们全部标记出来,您需要通过删除a.StrDate<=b.date<=a.EndDate来修改表查找条件。

英文:

Extend, count and remerge, this is my thought.

*An extra observation added to ID 1002;
data have;
  input ID $ ITEM $ StrDate mmddyy10. +1 EndDate mmddyy10.;
  format StrDate EndDate mmddyy10.;
  cards;
  1001 A121 02/01/2022 02/15/2022
  1001 B121 03/10/2022 03/10/2022
  1002 C121 02/01/2022 02/10/2022
  1002 D121 02/05/2022 02/15/2022
  1002 D121 03/05/2022 03/15/2022
  1003 E121 03/10/2022 03/21/2022
  1003 F121 03/12/2022 03/21/2022
  1004 G121 01/12/2022 01/14/2022
 ;
run;

*Extend;
data middle;
  set have;
  do date=StrDate to EndDate;
    output;
  end;
run;

*Count and remerge;
proc sql noprint;
  create table want as 
  select distinct a.*, ifc(b.count and a.StrDate&lt;=b.date&lt;=a.EndDate,&#39;Y&#39;,&#39;N&#39;) as Indicator
  from have as a
  left join (
    select id, date, count(date) as count from middle
    group by id, date
    having count&gt;1
  ) as b on a.id=b.id
  ;
quit;

By the way, if not all records overlapping in dates of one ID but you want to flag all of them out, you need to modify table lookup condition by removing the a.StrDate&lt;=b.date&lt;=a.EndDate.

答案3

得分: 0

简单的重叠逻辑:

proc sql;
create table want as
select
    a.*,
    /* 简单重叠逻辑  */
    case
        when a.strdate &lt;= b.strdate &amp; a.enddate &gt;= b.strdate then &#39;Y&#39;
        when b.strdate &lt; a.strdate &amp; b.enddate &gt;= a.strdate then &#39;Y&#39;
        else &#39;N&#39;
        end as overlap
from
    have  a
    left join
    have  b
        on a.id = b.id         /* 在相同的ID上连接      */
        and a.item &lt;&gt; b.item   /* 但物品不相同 */
;
quit;

结果:

ID	ITEM	StrDate	EndDate	overlap
1001	B121	03/10/2022	03/10/2022	N
1001	A121	02/01/2022	02/15/2022	N
1002	D121	02/05/2022	02/15/2022	Y
1002	C121	02/01/2022	02/10/2022	Y
1003	E121	03/10/2022	03/21/2022	Y
1003	F121	03/12/2022	03/21/2022	Y
1004	G121	01/12/2022	01/14/2022	N

重叠发生在 StartA <= StartB 时:

StartA          EndA&gt;=StartB
   |-------------|
             |---------
          StartB
英文:

Simple overlap logic:

proc sql;
create table want as
select
    a.*,
    /* simple overlap logic  */
    case
        when a.strdate &lt;= b.strdate &amp; a.enddate &gt;= b.strdate then &#39;Y&#39;
        when b.strdate &lt; a.strdate &amp; b.enddate &gt;= a.strdate then &#39;Y&#39;
        else &#39;N&#39;
        end as overlap
from
    have  a
    left join
    have  b
        on a.id = b.id         /* join on same ids      */
        and a.item &lt;&gt; b.item   /* but not the same item */
;
quit;

Result:

ID	ITEM	StrDate	EndDate	overlap
1001	B121	03/10/2022	03/10/2022	N
1001	A121	02/01/2022	02/15/2022	N
1002	D121	02/05/2022	02/15/2022	Y
1002	C121	02/01/2022	02/10/2022	Y
1003	E121	03/10/2022	03/21/2022	Y
1003	F121	03/12/2022	03/21/2022	Y
1004	G121	01/12/2022	01/14/2022	N

Overlap occurs if StartA <= StartB when:

StartA          EndA&gt;=StartB
   |-------------|
             |---------
          StartB

huangapple
  • 本文由 发表于 2023年2月8日 09:31:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75380583.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定