SAS:在proc fcmp函数中动态指定哈希对象中的数据集名称

huangapple go评论57阅读模式
英文:

SAS: dynamically specify dataset name in hash object in proc fcmp function

问题

我想将数据集的名称指定为传递给声明哈希对象的fcmp函数的输入参数:

function somefunction(dsn $, k1 $, k2 $, k3 $, k4 $);
   declare hash h(dataset: "work.someDatasetName");
   *declare hash h(dataset: dsn);
   rc = h.defineKey('k1', 'k2', 'k3', 'k4');
   rc = h.defineData('d1', 'd2', 'd3', 'd4', 'd5', 'd6');
   rc = h.definedone();
   rc = h.find();

被注释的行declare hash h(dataset: dsn); 不会工作。声明函数要求使用文字或字符变量。如何将dsn的值从函数参数传递到哈希对象声明语句?

我得到的错误是:

163 declare hash h(dataset: dsn);
___
22
202
ERROR 22-322: 期望引号括起的字符串。
ERROR 202-322: 选项或参数未被识别,将被忽略。

SAS文档

argument_tag:value

指定用于创建哈希对象实例的信息。
有五个有效的哈希对象参数和值标签:
dataset: 'dataset_name <(datasetoption)>'

指定要加载到哈希对象中的SAS数据集的名称。
SAS数据集的名称可以是文字或字符变量。数据集名称必须用单引号或双引号括起来。宏变量必须用双引号括起来。

英文:

I would like to specify the name of the dataset as an input parameter to the fcmp function that is declaring the hash object:

    function somefunction(dsn $, k1 $, k2 $, k3 $, k4 $);
      declare hash h(dataset: &quot;work.someDatasetName&quot;);
	  *declare hash h(dataset: dsn);
      rc = h.defineKey(&#39;k1&#39;, &#39;k2&#39;, &#39;k3&#39;, &#39;k4&#39;);
      rc = h.defineData(&#39;d1&#39;, &#39;d2&#39;, &#39;d3&#39;, &#39;d4&#39;, &#39;d5&#39;, &#39;d6&#39;);
      rc = h.definedone();
	  rc = h.find();

The commented line declare hash h(dataset: dsn) will not work. The declare function requires a literal or character variable. How can transfer the value of dsn from the function arguments into the hash object declaration statement?

The error that I get is:

163 declare hash h(dataset: dsn);
___
22
202
ERROR 22-322: Expecting a quoted string.
ERROR 202-322: The option or parameter is not recognized and will be ignored.


SAS Documentation

argument_tag:value

specifies the information that is used to create an instance of the hash object.
There are five valid hash object argument and value tags:
dataset: 'dataset_name <(datasetoption)>'

Specifies the name of a SAS data set to load into the hash object.
The name of the SAS data set can be a literal or character variable. The data set name must be enclosed in single or double quotation marks. Macro variables must be enclosed in double quotation marks.

答案1

得分: 2

  • PROC FCMP散列和散列迭代器语言元素
    • DECLARE语句:散列对象和散列迭代器对象

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lecompobjref/n098ljby3kptf0n1caygy0dipu2r.htm#p17vi6r6c5zhmen1aqtyxoml73os

SAS数据集的名称必须是文字。文字数据集名称必须用单引号或双引号括起来。

英文:

I think this limit is documented.

  • PROC FCMP Hash and Hash Iterator Language Elements
    • DECLARE Statement: Hash Object and Hash Iterator Object

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lecompobjref/n098ljby3kptf0n1caygy0dipu2r.htm#p17vi6r6c5zhmen1aqtyxoml73os

> The name of the SAS data set must be a literal. A literal data set
> name must be enclosed in single or double quotation marks.

答案2

得分: 0

正如Tom所指出的,无法像在数据步骤哈希对象中一样将字符串变量参数传递给fcmp函数内哈希对象的声明。

不过,可能存在替代方案(性能更好的查找解决方案)。

你提到这个查找函数要被“调用”数百万次:

  • 你是否指的是从数据步骤内调用,也就是数据步骤正在迭代(数百万次)浏览“主”数据集的观察结果,并且每个观察结果都有数据集名称以及你在示例fcmp函数中列出的其他输入参数?

  • 有多少个这些查找表?

  • 这些查找表有多大?每个数据集有多少个唯一值?它们是否都能装入内存?

从你的示例fcmp函数来看,这些查找表似乎具有相同的数据结构,或者至少需要相同的键和数据变量来满足查找的要求?

根据你对上述问题的回答,数据步骤哈希对象(以及一些宏代码的补充)可能会满足你的需求——可以在迭代“主”数据集之前或在运行时加载。

或者可能存在更适合你特定需求的替代查找方法。没有更多信息很难说。

编辑:

一小部分查找表(根据大小而定)可以在数据步骤中预加载——也就是在开始迭代主数据集之前,例如,在do until(eof_master)的代码内。

然后,我会在do until循环内使用if/then逻辑(或select/when),根据来自主数据集的传入数据集名称来查找值(在相关的哈希对象中)。

哈希对象的预加载很可能会多次弥补if/then或select/when的性能损耗。

注意:不需要使用宏代码,记得在do until循环之后添加一个stop语句。

英文:

As pointed out by Tom, it is not possible to pass a string variable argument to the declaration of a hash object within an fcmp function (in the same way that you can for a data step hash object).

That said, there may well be an alternative (and better performing) lookup solution.

You mention that this lookup function is to be 'called' millions of times:-

  • Do you mean called from within a data step - ie where the data step is iterating through (millions of) observations from a 'master' dataset and each observation has variables for the dataset name and other input parameters listed in your example fcmp function?

  • How many of these lookup tables are there?

  • How large are these lookup tables? How many unique values per dataset? Can they all fit into memory?

  • It would appear from your example fcmp function that these lookup tables have either the same data structure, or at least the same key and data variables required to satisfy the lookup?

Depending on your answers to the above, data step hash object(s), with the addition of some macro code, may well satisfy your requirement - either pre-loaded before iterating through your 'master' dataset or loaded on the fly.

Or there maybe an alternative lookup approach that is more appropriate for your particular need. Difficult to say without more information.

EDIT:

A small number of lookup tables could (depending on size) be preloaded in your data step - ie before you start iterating thru your master dataset with, for example, do until(eof_master) code.

I would then use if/then logic (or select/when) within the do until loop to lookup values (in the relevant hash object) based on the incoming dataset name (from your master).

The pre-loading of hash objects will likely more than compensate for the if/then or select/when.

NB There is no need to use macro code and remember to add a stop stmt after the do until loop.

huangapple
  • 本文由 发表于 2023年6月16日 03:14:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484855.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定