2023年7月27日 20:34:31go评论60阅读模式

英文:

Need to create a column in a dataset based on the value in a different dataset

问题

我有一个数据集，其中包含客户ID和每个ID的均匀随机变量，看起来像这样：

客户ID	随机均匀变量
1234	0.38223456
5678	0.7456789

*客户ID是唯一的，充当此表的主键

然后我有另一个数据集，其中有一个“结果数字段”和一个累积概率列，看起来像这样：

结果数	累积概率
1	0.543678
2	0.756789

我的目标是将客户ID表中的均匀随机变量与第二个表中的累积概率列进行比较，以创建客户表中的新列。

因此，例如，客户表的第一行将在新列中具有值“1”，因为它小于或等于第一个累积概率。第二行将在新列中具有“2”，因为它大于第一个累积概率但小于第二个累积概率，依此类推。

我尝试过使用do循环和SQL查询，但都没有成功。

英文:

I have a dataset that has a client id, and a uniform random variable for each id that looks something like this:

ClientId	RandUnif
1234	0.38223456
5678	0.7456789

*client ids are unique and serve as the primary key for this table

I then have another data set that has a “number of outcomes field” and a cumulative probability column that looks like this:

NumOutcomes	CumulativeProb
1	0.543678
2	0.756789

My goal is to compare the uniform random variable from my client id table to the cumulative probability column in the second table in order to create a new column in the client table.

So, for instance the first row of the client table would have a value in the new column of “1” because it is less than our equal to the first cumulative probability. The second row would have a “2” in the new column because it’s greater than the first cumulative probability but less than the second, and so on.

I have tried using a do loop, and a sql query to no avail

答案1

得分: 1

您可以执行一个交叉连接（cross-join），其中RandomUnif <= CumulativeProb。这将为您获得以下表格：

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
1234	    2	        0.38223456	0.756789
5678	    2	        0.7456789	0.756789

然后，您将按每个ClientID的CumulativeProb的最小值进行筛选，因为最小值小于或等于第一个累积概率。您可以使用SQL的HAVING子句来实现这一点：

proc sql;
    create table want as
        select ClientID
             , NumOutcomes
             , RandUnif
             , CumulativeProb
        from have, lookup
        where RandUnif LE CumulativeProb
        group by ClientID
        having CumulativeProb = min(CumulativeProb)
    ;
quit;

最终结果如下：

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
5678	    2	        0.7456789	0.756789

英文:

You can do a cross-join where RandomUnif <= CumulativeProb. This would get you a table like this:

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
1234	    2	        0.38223456	0.756789
5678	    2	        0.7456789	0.756789

From there, you'll take the minimum value of CumulativeProb by each ClientID since the minimum value is less than or equal to the first cumulative probability. You can do this with a sql having clause.

proc sql;
    create table want as
        select ClientID
             , NumOutcomes
             , RandUnif
             , CumulativeProb
        from have, lookup
        where RandUnif LE CumulativeProb
        group by ClientID
        having CumulativeProb = min(CumulativeProb)
    ;
quit;

ClientID	NumOutcomes	RandUnif	CumulativeProb
1234	    1	        0.38223456	0.543678
5678	    2	        0.7456789	0.756789

答案2

得分: 0

使用 Stu 的回答，以下是最终对我有效的代码：

proc sql;

create table output as

select clientid
, col2
, other_cols_needed
from (select *
      from client_table t1
      inner join cum_prob_table t2
      on t1.rand_unif <= t2.cumulative_probability)
group by clientid
having cumulative_probability = min(cumulative_probability); 

quit;

再次感谢！

英文:

Using the answer from Stu this is what ended up working for me:

proc sql; 

create table output as

select clientid
, col2
, other_cols_needed
from (select *
      from client_table t1
      inner join cum_prob_table t2
      on t1.rand_unif &lt;= t2.cumulative_probability)
group by clientid
having cumulative_probability = min(cumulative_probability); 

quit;

thanks again!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

需要根据不同数据集中的值在一个数据集中创建一列。

问题

答案1

答案2

在使用字母表示ANOVA中的处理时，出现了SAS错误代码”Variables missing”。

在笛卡尔平面上移动的概率

麦当劳的Omega在SAS中

如何循环遍历一个 %let 语句的列表？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论