需要根据不同数据集中的值在一个数据集中创建一列。

huangapple go评论60阅读模式
英文:

Need to create a column in a dataset based on the value in a different dataset

问题

我有一个数据集,其中包含客户ID和每个ID的均匀随机变量,看起来像这样:

客户ID 随机均匀变量
1234 0.38223456
5678 0.7456789

*客户ID是唯一的,充当此表的主键

然后我有另一个数据集,其中有一个“结果数字段”和一个累积概率列,看起来像这样:

结果数 累积概率
1 0.543678
2 0.756789

我的目标是将客户ID表中的均匀随机变量与第二个表中的累积概率列进行比较,以创建客户表中的新列。

因此,例如,客户表的第一行将在新列中具有值“1”,因为它小于或等于第一个累积概率。第二行将在新列中具有“2”,因为它大于第一个累积概率但小于第二个累积概率,依此类推。

我尝试过使用do循环和SQL查询,但都没有成功。

英文:

I have a dataset that has a client id, and a uniform random variable for each id that looks something like this:

ClientId RandUnif
1234 0.38223456
5678 0.7456789

*client ids are unique and serve as the primary key for this table

I then have another data set that has a “number of outcomes field” and a cumulative probability column that looks like this:

NumOutcomes CumulativeProb
1 0.543678
2 0.756789

My goal is to compare the uniform random variable from my client id table to the cumulative probability column in the second table in order to create a new column in the client table.

So, for instance the first row of the client table would have a value in the new column of “1” because it is less than our equal to the first cumulative probability. The second row would have a “2” in the new column because it’s greater than the first cumulative probability but less than the second, and so on.

I have tried using a do loop, and a sql query to no avail

答案1

得分: 1

您可以执行一个交叉连接(cross-join),其中RandomUnif <= CumulativeProb。这将为您获得以下表格:

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
1234	    2	        0.38223456	0.756789
5678	    2	        0.7456789	0.756789

然后,您将按每个ClientID的CumulativeProb的最小值进行筛选,因为最小值小于或等于第一个累积概率。您可以使用SQL的HAVING子句来实现这一点:

proc sql;
    create table want as
        select ClientID
             , NumOutcomes
             , RandUnif
             , CumulativeProb
        from have, lookup
        where RandUnif LE CumulativeProb
        group by ClientID
        having CumulativeProb = min(CumulativeProb)
    ;
quit;

最终结果如下:

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
5678	    2	        0.7456789	0.756789
英文:

You can do a cross-join where RandomUnif <= CumulativeProb. This would get you a table like this:

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
1234	    2	        0.38223456	0.756789
5678	    2	        0.7456789	0.756789

From there, you'll take the minimum value of CumulativeProb by each ClientID since the minimum value is less than or equal to the first cumulative probability. You can do this with a sql having clause.

proc sql;
    create table want as
        select ClientID
             , NumOutcomes
             , RandUnif
             , CumulativeProb
        from have, lookup
        where RandUnif LE CumulativeProb
        group by ClientID
        having CumulativeProb = min(CumulativeProb)
    ;
quit;
ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
5678	    2	        0.7456789	0.756789

答案2

得分: 0

使用 Stu 的回答,以下是最终对我有效的代码:

proc sql;

create table output as

select clientid
, col2
, other_cols_needed
from (select *
      from client_table t1
      inner join cum_prob_table t2
      on t1.rand_unif <= t2.cumulative_probability)
group by clientid
having cumulative_probability = min(cumulative_probability); 

quit;

再次感谢!
英文:

Using the answer from Stu this is what ended up working for me:

proc sql; 

create table output as

select clientid
, col2
, other_cols_needed
from (select *
      from client_table t1
      inner join cum_prob_table t2
      on t1.rand_unif &lt;= t2.cumulative_probability)
group by clientid
having cumulative_probability = min(cumulative_probability); 

quit; 

thanks again!

huangapple
  • 本文由 发表于 2023年7月27日 20:34:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76779794.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定