Self join – 更新一个变量 – proc sql

huangapple go评论85阅读模式
英文:

Self join - update a variable - proc sql

问题

proc sql;
  create table MyTable2 as
  select t1.*, t2.age as age_partner
  from MyTable as t1
  left join MyTable as t2
    on t1.household = t2.household
    and t1.flag = t2.flag
    and t1.gender ne t2.gender;
quit;
    input Household age gender $ flag $ age_partner;
    datalines;
1 45 male married
1 35 female married 45
2 50 female married 52
2 52 male married 50
3 32 male child  
3 50 male married 60
3 60 female married 50
;

You can achieve the desired output by using a single LEFT JOIN within the PROC SQL query. This will match each person with their spouse based on the same household and flag (married), and ensure that the age is associated with the correct gender.

英文:

I have this table:

data MyTable;
    input Household age gender $ flag;
    datalines;
1 45 male married
1 35 female married
2 50 female married
2 52 male married
3 32 male child
3 50 male married
3 60 female married
;

I have married couples and other persons in households; no intergenerational or same sex couples. I need to join a husband value to the wife and a wife value to the husband to get this output:

data MyTable2;
    input Household age gender $ flag $ age_partner;
    datalines;
1 45 male married 35
1 35 female married 45
2 50 female married 52
2 52 male married 50
3 32 male child  
3 50 male married 60
3 60 female married 50
;

That does not work; it executes only the t2 part.

proc sql;
  create table MyTable2 as
  select t1.*, t2.age as age_partner, t3.age as age_partner
  from MyTable as t1
  left join (select household, age from MyTable where gender = 'male' & flag = 'married') as t2
    on t1.household=t2.household & t1.gender = 'female' & t1.flag = 'married'
  left join (select household, age from MyTable where gender = 'female' & flag = 'married') as t3
    on t1.household=t3.household & t1.gender = 'male' & t1.flag = 'married'; 
quit;
    input Household age gender $ flag $ age_partner;
    datalines;
1 45 male married
1 35 female married 45
2 50 female married 52
2 52 male married
3 32 male child  
3 50 male married
3 60 female married 50
;

This is the output. It joins the values only to the females.

If I separate these two joins, the last self join rewrites the previous join and in the output only the husbands will have the wife ages and the wives won't have the husband ages.

I can do it in separate columns and after that in the third column I make the needed output.

How do I do it in one proc?

答案1

得分: 2

假设你的逻辑是成立的(我怀疑这在各种数据质量原因下可能不成立)。

proc sql;
create table want as
select t1.*, case when t1.flag='married' then t2.age else . end as age_partner
from mytable as t1
left join (select * from mytable where flag='married') as t2
on t1.household=t2.household and t1.gender ne t2.gender;
quit;
英文:

Assuming your logic will hold (I suspect this will not hold for various data quality reasons).

proc sql;
create table want as
select t1.*, case when t1.flag='married' then t2.age else . end as age_partner
from mytable as t1
left join (select * from mytable where flag='married') as t2
on t1.household=t2.household and t1.gender ne t2.gender;
quit;

huangapple
  • 本文由 发表于 2023年8月9日 17:04:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76866165-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定