英文:
fastest way to read and iterate over the rows of a table, and read/call a function with corresponding arguments and raw data from each row?
问题
我有两张表格。
第一张表格是一个名为tab的表格,其中包含了function_name和args列,如下所示:
tab:flip `function`root`args!(`symbol$();();());
insert[`tab](`.qa.test1;"ES";`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2010.12.31;`ES;0.8;10;1));
insert[`tab](`.qa.test1;"NQ;SI";`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31;`NQ`SI;0.8;20;1));
insert[`tab](`.qa.test1;"ALL";`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31;`HQ`ZK`NA`PL`BZ;0.8;12;2));
第二张表格是一个名为raw的原始数据表格,如下所示:
//创建示例数据
raw:([]date:`date$();sym:`symbol$();name:`symbol$();expiration_date:`date$());
//修改数据以模拟要求
{`raw insert (x;`ESH10;`someName1;2010.01.10)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH11;`someName2;2010.01.15)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH12;`someName3;2010.01.20)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH13;`someName4;2010.01.16)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH14;`someName5;2010.01.13)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH15;`someName6;2010.01.28)} each (2010.01.01 + til 31);
现在,我想要迭代表格tab的每一行,调用function列中编写的函数,同时将args和raw表格作为输入。类似于以下方式:
.qa.test1[(`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2010.12.31d;`ES;0.8;10;1));raw]
.qa.test1[(`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31d;`NQ`SI;0.8;20;1));raw]
.qa.test1[(`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31d;`HQ`ZK`NA`PL`BZ;0.8;12;2));raw]
我提出的解决方案如下,虽然能够工作,但速度很慢。是否有更好的方法在kdb+中完成这个任务?
.qa.get_tab_main:{[raw]
// raw是具有原始数据的表格,.qa.test1接受两个参数.qa.test1[args;raw]
// 遍历表格行并将结果追加到tab表格中
cnt:(count tab);
ino:0;
res1::flip `date`sym`name`expiration_date!(`date$();`symbol$();`symbol$();`date$()); //从.qa.test1得到最终输出表的元数据
while[ino<cnt;
data:exec .[first function;(args;raw)] from tab where i=ino;
upsert[`tab;(select from data)];
ino:ino+1
];
res:`date xasc res1;
delete res1 from `.;
res
};
请注意,我处理的数据非常庞大,包含8000000行。
【翻译完毕】
英文:
I have 2 tables with me.
1st table is tab having funtion_name and args column like below
tab:flip `function`root`args!(`symbol$();();());
insert[`tab](`.qa.test1;"ES";`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2010.12.31;`ES;0.8;10;1));
insert[`tab](`.qa.test1;"NQ;SI";`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31;`NQ`SI;0.8;20;1));
insert[`tab](`.qa.test1;"ALL";`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31;`HQ`ZK`NA`PL`BZ;0.8;12;2));
2nd table is raw having raw data like below
//create sample data
raw:([]date:`date$();sym:`symbol$();name:`symbol$();expiration_date:`date$());
//modify data to simulate requirement
{`raw insert (x;`ESH10;`someName1;2010.01.10)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH11;`someName2;2010.01.15)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH12;`someName3;2010.01.20)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH13;`someName4;2010.01.16)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH14;`someName5;2010.01.13)} each (2010.01.01 + til 31);
{`raw insert (x;`ESH15;`someName6;2010.01.28)} each (2010.01.01 + til 31);
Now I want Iterate each row of table tab to call the function written in the function column along with the args + raw table as input
some like
.qa.test1[(`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2010.12.31d;`ES;0.8;10;1));raw]
.qa.test1[(`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31d;`NQ`SI;0.8;20;1));raw]
.qa.test1[(`startDate`endDate`roots`rollPerct`expDateThreshold`expDateThresholdExpiry!(2010.01.01d;2013.12.31d;`HQ`ZK`NA`PL`BZ;0.8;12;2));raw]
The solution I came up with is below, its working but its very slow.
Is there a Better Way of doing this in kdb+?
Note: The data I am dealing with is huge having 8000000 rows.
//solution which is slow
.qa.get_tab_main:{[raw]
//raw is the table having raw data, .qa.test1 accepts 2 args .qa.test1[args;raw]
//iterate over the table rows and append the result to tab table
cnt:(count tab);
ino:0;
res1::flip `date`sym`name`expiration_date!(`date$();`symbol$();`symbol$();`date$()); //meta of final output table from .qa.test1
while[ino<cnt;
data:exec .[first function;(args;raw)] from tab where i=ino;
upsert[`tab;(select from data)];
ino:ino+1
];
res:`date xasc res1;
delete res1 from `.;
res
};
答案1
得分: 2
q){x[`function] .(x[`args];raw)}each tab
将结果捕获到一个单一的表格中:
q)res:raze {x[`function] .(x[`args];raw)} each tab
根据函数内部涉及的逻辑,您可以尝试使用peach
来实现多核加速。
https://code.kx.com/q/basics/peach/
<details>
<summary>英文:</summary>
https://code.kx.com/q/ref/apply/
```q
q){x[`function] .(x[`args];raw)}each tab
To capture results in a single table:
q)res:raze {x[`function] .(x[`args];raw)} each tab
Depending on logic involved inside your functions you may be able to use peach
for multicore speedup.
https://code.kx.com/q/basics/peach/
答案2
得分: 1
使用qsql:
q)从tab中执行exec value each(function,'(enlist each args),\:enlist raw) from tab
如果您想要单个表输出,可以使用raze
。
英文:
Using qsql:
q)exec value each(function,'(enlist each args),\:enlist raw) from tab
You can raze
the result if you want a single table output.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论