提取具有与特定模式匹配的列名的行在kdb q中

huangapple go评论109阅读模式
英文:

Extract rows with column name matching a specific pattern in kdb q

问题

以下是您要求的翻译部分:

我有一个包含多种字符字符串模式的表格。我需要提取特定模式的数据。

tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22"; "CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))

从上面的表格中,我需要提取符合模式"?* .** ???-??"的数据。

模式"?* .** ???-??"被分为三个部分:

  1. 第一部分 - 我需要它是字母,这是名称(可以是任意数量的字母)(例如 AB 或 ABC 或 ABCD 等)
  2. 第二部分 - 我需要它以小数值格式表示,可以是一个或多个小数值或最大可用值
  3. 第三部分 - 模式中的日期格式 APR-24

我尝试了以下,但没有显示任何数据

q)pattern: "^[A-Za-z]+ [0-9]+\\.[0-9]+ [A-Za-z]+-[0-9]+$"  / 用于匹配的模式

q)result: select description from tableName where description like\: pattern

q)result

输出如下:

description
------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
英文:

I have a table with multiple patterns of char string. I need to extract the data with specific pattern .

tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22";"CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))

From the above table i need to extract data which is of pattern "?* .** ???-??"

The pattern "?* .** ???-??" is divided into three parts

  1. first part - i need it to be in letter which is the name(can be any number of letters) (ex. AB or ABC or ABCD etc)
  2. second part - i need it in decimal values format , can be more than one decimal value or max available
  3. third part - date format in the pattern APR-24

i tried the below , but does not show any data

q)pattern: "^[A-Za-z]+ [0-9]+\\.[0-9]+ [A-Za-z]+-[0-9]+$"  / Pattern to match

q)result: select description from tableName where description like\: pattern

q)result

OUTPUT AS

description
------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"

答案1

得分: 0

kdb+不实现完整的正则表达式模式匹配。

https://code.kx.com/q/basics/regex/

它支持? * [] ^

如果你需要完整支持,可以使用外部库:

英文:

kdb+ does not implement full regular expression pattern matching.

https://code.kx.com/q/basics/regex/

It supports ? * [] ^

If you want full support you can use an external library:

答案2

得分: 0

以下是您要求的翻译:

这里是一个您可以完全在Q内部实现的方法:

tableName:update description:{" " vs x} each description from tableName; // 使用空格分隔
tableName:select from tableName where 3=count each description, all each description[;0] in .Q.A, all each description[;1] in (.Q.n,"."), description[;2] like "[A-Z][a-z][a-z]-[0-3][0-9]"; // 应用条件
tableName:update description:{" " sv x}each description from tableName; // 重新组合成字符串

首先,通过空格分隔字符串。然后,where 子句的四个元素是:

  1. 检查使用空格分隔后是否有3个元素。
  2. 检查第一个文本块中的所有元素是否都是大写字母 (.Q.A)。
  3. 检查第二个文本块中的所有元素是否都是数字和小数点 (.Q.n, ".")。
  4. 使用Q的内置正则表达式功能来检查最后一个文本块是否符合我们的模式。

结果:

description       
------------------
"ABC 1.45 Apr-24" 
"DE 2.456 Mar-22" 
"XYZ 3.7 Jun-25"  
"PQR 4.12 May-21" 
"MNO 7.345 Jul-23"
英文:

Here's a method you could implement entirely within Q:

tableName:update description:{" " vs x} each description from tableName; // Split into components
tableName:select from tableName where 3=count each description, all each description[;0] in .Q.A, all each description[;1] in (.Q.n,"."), description[;2] like "[A-Z][a-z][a-z]-[0-3][0-9]"; // Apply clauses
tableName:update description:{" " sv x}each description from tableName; // Rejoin into strings

This first breaks up the strings by a space delimiter. Then the 4 elements of the where clause are:

  1. Check if there are 3 elements after slitting up by a space delimiter.
  2. Check if all elements in the first text block are Alphabetic Capitalised (.Q.A)
  3. Check if all elements in the second text block are Numeric and Decimal (.Q.n,".")
  4. Use Q's inbuilt regex functionality to check if the last text block adheres to our pattern

Result:

description       
------------------
"ABC 1.45 Apr-24" 
"DE 2.456 Mar-22" 
"XYZ 3.7 Jun-25"  
"PQR 4.12 May-21" 
"MNO 7.345 Jul-23"

答案3

得分: 0

你可以像这样做:

q)tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22";"CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))
q)select from tableName where{p:" "vs x;all(all p[0]in .Q.a;not null"F"$p 1;any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]");3=count p)}each lower description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"

where子句的解释:

p:" "vs x / 使用空格分隔字符串
all p[0]in .Q.a / 检查第一个部分是否全部为字母
not null"F"$p 1 / 将第二个部分的字符串转换为浮点数以检查是否不为空
any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]") / 检查第三个部分是否匹配[a-z][a-z][a-z]-[0-9][0-9]或[a-z][a-z][a-z]-[0-9]
3=count p / 检查分割数量是否为3

编辑:根据@SeanHehir的建议进行了更新!

英文:

You can do something like this:

q)tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22";"CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))
q)select from tableName where{p:" "vs x;all(all p[0]in .Q.a;not null"F"$p 1;any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]");3=count p)}each lower description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"

Explanation for the where clause:

p:" "vs x /split string by whitespace delimiter
all p[0]in .Q.a /checks 1st part are all alphabets
not null"F"$p 1 /cast 2nd part string to float to see it is not null
any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]") /check 3rd part matches [a-z][a-z][a-z]-[0-9][0-9] OR [a-z][a-z][a-z]-[0-9]
3=count p /check the split count

EDIT: updated based on @SeanHehir suggestions!

答案4

得分: 0

以下是您要求的内容的中文翻译:

您可以在q中使用有限状态机来实现这个功能(https://code.kx.com/q/basics/regex/#regex-in-q)。但性能可能比这里发布的其他答案差。

m:(0;
   {2*all(x?" ")#x in raze .Q`a`A};
   {3*all(" "vs x)[1]in .Q.n,"."};
   {4*x like"[A-Z][a-z][a-z]-[0-9][0-9]"}
   )
f:{4=m/[1;" "vs x]}

如果您事先知道您的列只包含2个空格:

q)从表名选择f每个描述
描述
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"

否则,您也可以根据这个条件进行过滤:

q)从表名选择从表名选择2=每个" "=描述,f每个描述
描述
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
英文:

You can implement this using a finite state machine in q also (https://code.kx.com/q/basics/regex/#regex-in-q). But the performance is likely worse than the other answers posted here.

m:(0;
   {2*all(x?" ")#x in raze .Q`a`A};
   {3*all(" "vs x)[1]in .Q.n,"."};
   {4*x like"[A-Z][a-z][a-z]-[0-9][0-9]"}
   )
f:{4=m/[1;" "vs x]}

If you know in advance that your column will only contain 2 spaces:

q)select from tableName where f each description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"

Otherwise, you can filter on that condition also:

q)select from tableName where 2=sum each" "=description,f each description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"

huangapple
  • 本文由 发表于 2023年6月15日 12:58:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76479253.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定