How to extracts the first letters from each words in the "sym" column and creates a new column called "derived" with unique values

huangapple go评论110阅读模式
英文:

How to extracts the first letters from each words in the "sym" column and creates a new column called "derived" with unique values

问题

如何从“sym”列中提取每个单词的第一个字母,并创建一个名为“derived”的新列,其中包含唯一值。

我有一个事件表,其中有大约5,000个唯一事件。现在我想生成自己的事件简称。新生成的“derived_sym”列应该是唯一的,例如,如果之前已经生成了FBS,则再次遇到FBS时应该加上1、2、3、4、5等前缀。

示例表格

//生成示例表格
n: 30;  // 表格中的行数
syms:("API Crude Oil Stock Change"; "Michigan Consumer Sentiment Final"; "Michigan Consumer Sentiment Prel"; "Inflation Rate YoY"; "FOMC Economic Projections"; "FOMC Minutes"; "Fed Barkin Speech"; "Fed Barr Testimony"; "Fed Beige Book"; "Fed Bostic Speech"; "Fed Bowman Speech"; "Fed Bullard Speech"; "Fed Chair Powell Speech"; "Fed Collins Speech"; "Fed Cook Speech"; "Fed Daly Speech"; "10-Year Note Auction");
tab:([] date: .z.d - n?30; ranks: n?100; sym: n?syms; price: "f"$n?100.5; recv_time: n?10:00:00.000 + n?10000000; is_active: "b"$n?1 0);
select by sym from tab

期望的输出如下图所示:
How to extracts the first letters from each words in the "sym" column and creates a new column called "derived" with unique values

英文:

How to extracts the first letters from each words in the "sym" column and creates a new column called "derived" with unique values.

I have a table of events having say 5k unique events in it. Now i want to generate my own short symbology of the events. The newly generated dereived_sym column should be unique and say example if FBS we have generated previously then prefix with 1,2,3,4,5 and so on if FBS is encountered again.

Example table

//generate sample table
n: 30;  // Number of rows in the table
syms:("API Crude Oil Stock Change"; "Michigan Consumer Sentiment Final"; "Michigan Consumer Sentiment Prel"; "Inflation Rate YoY"; "FOMC Economic Projections"; "FOMC Minutes"; "Fed Barkin Speech"; "Fed Barr Testimony"; "Fed Beige Book"; "Fed Bostic Speech"; "Fed Bowman Speech"; "Fed Bullard Speech"; "Fed Chair Powell Speech"; "Fed Collins Speech"; "Fed Cook Speech"; "Fed Daly Speech"; "10-Year Note Auction");
tab:([] date: .z.d - n?30; ranks: n?100; sym: n?syms; price: "f"$n?100.5; recv_time: n?10:00:00.000 + n?10000000; is_active: "b"$n?1 0);
select by sym from tab

Desired Ouput
How to extracts the first letters from each words in the "sym" column and creates a new column called "derived" with unique values

答案1

得分: 3

你可以根据你的规则创建一个字典:

symLookup:exec sym!derived_sym from 
    update {x,'@[;0;:;""']string til count x} derived_sym by derived_sym from 
    update derived_sym:{first each " " vs x}each sym from 
    select distinct sym from tab
q)symLookup
"Fed Cook Speech"                  | "FCS"
"Fed Collins Speech"               | "FCS1"
"Fed Bostic Speech"                | "FBS"
"Fed Bowman Speech"                | "FBS1"
"10-Year Note Auction"             | "1NA"
"Fed Bullard Speech"               | "FBS2"
"Fed Beige Book"                   | "FBB"
"Fed Chair Powell Speech"          | "FCPS"
"Michigan Consumer Sentiment Final"| "MCSF"
"API Crude Oil Stock Change"       | "ACOSC"
"FOMC Minutes"                     | "FM"
"FOMC Economic Projections"        | "FEP"
"Inflation Rate YoY"               | "IRY"
"Fed Barkin Speech"                | "FBS3"

然后在表格中创建新的列:

q)update derived_sym:symLookup sym from tab
date       ranks sym                                 price     recv_time    is_active derived_sym
-------------------------------------------------------------------------------------------------
2023.05.31 15    "Fed Cook Speech"                   55.25425  11:32:21.596 1         "FCS"
2023.05.19 84    "Fed Collins Speech"                19.68259  10:38:48.058 1         "FCS1"
2023.05.31 82    "Fed Bostic Speech"                 56.43337  12:45:25.000 1         "FBS"
2023.05.14 90    "Fed Collins Speech"                7.079031  11:51:20.492 0         "FCS1"
2023.05.18 66    "Fed Bowman Speech"                 21.34627  12:13:45.275 1         "FBS1"
2023.05.29 2     "Fed Cook Speech"                   78.17714  11:30:50.872 0         "FCS"
2023.06.12 96    "Fed Cook Speech"                   48.68951  12:31:39.330 1         "FCS"
2023.06.01 93    "10-Year Note Auction"              68.62139  10:36:40.212 0         "1NA"
2023.05.26 5     "Fed Bullard Speech"                15.39931  11:33:14.916 1         "FBS2"
2023.05.31 58    "Fed Beige Book"                    53.77677  12:13:45.275 0         "FBB"
2023.05.26 31    "Fed Beige Book"                    45.96147  11:05:42.696 1         "FBB"
2023.06.01 7     "Fed Bowman Speech"                 0.8102834 11:05:42.696 0         "FBS1"
2023.05.18 53    "Fed Chair Powell Speech"           10.4454   12:34:35.078 0         "FCPS"
2023.05.28 38    "Fed Bullard Speech"                10.49734  12:31:11.038 1         "FBS2"
2023.06.01 23    "Michigan Consumer Sentiment Final" 33.96998  12:34:35.078 1         "MCSF"
2023.05.29 27    "API Crude Oil Stock Change"        48.85854  12:13:45.275 0         "ACOSC"
2023.06.06 32    "FOMC Minutes"                      48.83224  10:53:26.221 1         "FM"
2023.06.03 82    "API Crude Oil Stock Change"        98.46267  12:34:35.078 0         "ACOSC"
英文:

You can create a dictionary based on your rule:

q)symLookup:exec sym!derived_sym from 
    update {x,'@[;0;:;""]string til count x} derived_sym by derived_sym from 
    update derived_sym:{first each " " vs x}each sym from 
    select distinct sym from tab
q)symLookup
"Fed Cook Speech"                  | "FCS"
"Fed Collins Speech"               | "FCS1"
"Fed Bostic Speech"                | "FBS"
"Fed Bowman Speech"                | "FBS1"
"10-Year Note Auction"             | "1NA"
"Fed Bullard Speech"               | "FBS2"
"Fed Beige Book"                   | "FBB"
"Fed Chair Powell Speech"          | "FCPS"
"Michigan Consumer Sentiment Final"| "MCSF"
"API Crude Oil Stock Change"       | "ACOSC"
"FOMC Minutes"                     | "FM"
"FOMC Economic Projections"        | "FEP"
"Inflation Rate YoY"               | "IRY"
"Fed Barkin Speech"                | "FBS3"

And then create the new column in the table:

q)update derived_sym:symLookup sym from tab
date       ranks sym                                 price     recv_time    is_active derived_sym
-------------------------------------------------------------------------------------------------
2023.05.31 15    "Fed Cook Speech"                   55.25425  11:32:21.596 1         "FCS"
2023.05.19 84    "Fed Collins Speech"                19.68259  10:38:48.058 1         "FCS1"
2023.05.31 82    "Fed Bostic Speech"                 56.43337  12:45:25.000 1         "FBS"
2023.05.14 90    "Fed Collins Speech"                7.079031  11:51:20.492 0         "FCS1"
2023.05.18 66    "Fed Bowman Speech"                 21.34627  12:13:45.275 1         "FBS1"
2023.05.29 2     "Fed Cook Speech"                   78.17714  11:30:50.872 0         "FCS"
2023.06.12 96    "Fed Cook Speech"                   48.68951  12:31:39.330 1         "FCS"
2023.06.01 93    "10-Year Note Auction"              68.62139  10:36:40.212 0         "1NA"
2023.05.26 5     "Fed Bullard Speech"                15.39931  11:33:14.916 1         "FBS2"
2023.05.31 58    "Fed Beige Book"                    53.77677  12:13:45.275 0         "FBB"
2023.05.26 31    "Fed Beige Book"                    45.96147  11:05:42.696 1         "FBB"
2023.06.01 7     "Fed Bowman Speech"                 0.8102834 11:05:42.696 0         "FBS1"
2023.05.18 53    "Fed Chair Powell Speech"           10.4454   12:34:35.078 0         "FCPS"
2023.05.28 38    "Fed Bullard Speech"                10.49734  12:31:11.038 1         "FBS2"
2023.06.01 23    "Michigan Consumer Sentiment Final" 33.96998  12:34:35.078 1         "MCSF"
2023.05.29 27    "API Crude Oil Stock Change"        48.85854  12:13:45.275 0         "ACOSC"
2023.06.06 32    "FOMC Minutes"                      48.83224  10:53:26.221 1         "FM"
2023.06.03 82    "API Crude Oil Stock Change"        98.46267  12:34:35.078 0         "ACOSC"

答案2

得分: 2

从 "10-Year Note Auction" 变成 "YNA",我假设我们首先需要删除所有不是字母或空格的字符。

分两步进行,首先生成缩写,然后添加数字后缀:
```q
tab2:update derived_sym:first each/:" "vs/:sym inter\:(" ",.Q.A,.Q.a) from tab
update derived_sym:{0N!x,'enlist[""],string 1+til count[x]-1}derived_sym by derived_sym from tab2
英文:

From how "10-Year Note Auction" becomes "YNA" I'm assuming we first need to delete all characters that are not alphabetic or spaces.

Doing it in two steps, first generating the abbreviations and then adding the number suffixes:

tab2:update derived_sym:first each/:" "vs/:sym inter\:(" ",.Q.A,.Q.a) from tab
update derived_sym:{0N!x,'enlist[""],string 1+til count[x]-1}derived_sym by derived_sym from tab2

date       ranks sym                                 price     recv_time    is_active derived_sym
-------------------------------------------------------------------------------------------------
2023.05.31 15    "Fed Cook Speech"                   55.25425  11:32:21.596 1         "FCS"
2023.05.19 84    "Fed Collins Speech"                19.68259  10:38:48.058 1         "FCS1"
2023.05.31 82    "Fed Bostic Speech"                 56.43337  12:45:25.000 1         "FBS"
2023.05.14 90    "Fed Collins Speech"                7.079031  11:51:20.492 0         "FCS2"
2023.05.18 66    "Fed Bowman Speech"                 21.34627  12:13:45.275 1         "FBS1"
2023.05.29 2     "Fed Cook Speech"                   78.17714  11:30:50.872 0         "FCS3"
2023.06.12 96    "Fed Cook Speech"                   48.68951  12:31:39.330 1         "FCS4"
2023.06.01 93    "10-Year Note Auction"              68.62139  10:36:40.212 0         "YNA"
...

huangapple
  • 本文由 发表于 2023年6月12日 19:08:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456064.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定