英文:
Extract rows with column name matching a specific pattern in kdb q
问题
以下是您要求的翻译部分:
我有一个包含多种字符字符串模式的表格。我需要提取特定模式的数据。
tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22"; "CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))
从上面的表格中,我需要提取符合模式"?* .** ???-??"的数据。
模式"?* .** ???-??"被分为三个部分:
- 第一部分 - 我需要它是字母,这是名称(可以是任意数量的字母)(例如 AB 或 ABC 或 ABCD 等)
- 第二部分 - 我需要它以小数值格式表示,可以是一个或多个小数值或最大可用值
- 第三部分 - 模式中的日期格式 APR-24
我尝试了以下,但没有显示任何数据
q)pattern: "^[A-Za-z]+ [0-9]+\\.[0-9]+ [A-Za-z]+-[0-9]+$" / 用于匹配的模式
q)result: select description from tableName where description like\: pattern
q)result
输出如下:
description
------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
英文:
I have a table with multiple patterns of char string. I need to extract the data with specific pattern .
tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22";"CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))
From the above table i need to extract data which is of pattern "?* .** ???-??"
The pattern "?* .** ???-??" is divided into three parts
- first part - i need it to be in letter which is the name(can be any number of letters) (ex. AB or ABC or ABCD etc)
- second part - i need it in decimal values format , can be more than one decimal value or max available
- third part - date format in the pattern APR-24
i tried the below , but does not show any data
q)pattern: "^[A-Za-z]+ [0-9]+\\.[0-9]+ [A-Za-z]+-[0-9]+$" / Pattern to match
q)result: select description from tableName where description like\: pattern
q)result
OUTPUT AS
description
------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
答案1
得分: 0
kdb+不实现完整的正则表达式模式匹配。
https://code.kx.com/q/basics/regex/
它支持?
*
[]
^
如果你需要完整支持,可以使用外部库:
- https://code.kx.com/q/basics/regex/#regex-libraries
- https://code.kx.com/developer/libraries/regular-expression-pcre2/
- https://github.com/KxSystems/ffi/blob/master/examples/pcre.q
- https://kx.com/blog/advanced-pattern-matching-and-text-manipulation-in-kdb-with-embedpy/
英文:
kdb+ does not implement full regular expression pattern matching.
https://code.kx.com/q/basics/regex/
It supports ?
*
[]
^
If you want full support you can use an external library:
答案2
得分: 0
以下是您要求的翻译:
这里是一个您可以完全在Q内部实现的方法:
tableName:update description:{" " vs x} each description from tableName; // 使用空格分隔
tableName:select from tableName where 3=count each description, all each description[;0] in .Q.A, all each description[;1] in (.Q.n,"."), description[;2] like "[A-Z][a-z][a-z]-[0-3][0-9]"; // 应用条件
tableName:update description:{" " sv x}each description from tableName; // 重新组合成字符串
首先,通过空格分隔字符串。然后,where 子句的四个元素是:
- 检查使用空格分隔后是否有3个元素。
- 检查第一个文本块中的所有元素是否都是大写字母 (.Q.A)。
- 检查第二个文本块中的所有元素是否都是数字和小数点 (.Q.n, ".")。
- 使用Q的内置正则表达式功能来检查最后一个文本块是否符合我们的模式。
结果:
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
英文:
Here's a method you could implement entirely within Q:
tableName:update description:{" " vs x} each description from tableName; // Split into components
tableName:select from tableName where 3=count each description, all each description[;0] in .Q.A, all each description[;1] in (.Q.n,"."), description[;2] like "[A-Z][a-z][a-z]-[0-3][0-9]"; // Apply clauses
tableName:update description:{" " sv x}each description from tableName; // Rejoin into strings
This first breaks up the strings by a space delimiter. Then the 4 elements of the where clause are:
- Check if there are 3 elements after slitting up by a space delimiter.
- Check if all elements in the first text block are Alphabetic Capitalised (.Q.A)
- Check if all elements in the second text block are Numeric and Decimal (.Q.n,".")
- Use Q's inbuilt regex functionality to check if the last text block adheres to our pattern
Result:
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
答案3
得分: 0
你可以像这样做:
q)tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22";"CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))
q)select from tableName where{p:" "vs x;all(all p[0]in .Q.a;not null"F"$p 1;any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]");3=count p)}each lower description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
对where
子句的解释:
p:" "vs x / 使用空格分隔字符串
all p[0]in .Q.a / 检查第一个部分是否全部为字母
not null"F"$p 1 / 将第二个部分的字符串转换为浮点数以检查是否不为空
any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]") / 检查第三个部分是否匹配[a-z][a-z][a-z]-[0-9][0-9]或[a-z][a-z][a-z]-[0-9]
3=count p / 检查分割数量是否为3
编辑:根据@SeanHehir的建议进行了更新!
英文:
You can do something like this:
q)tableName: ([] description: ("ABC 1.45 Apr-24"; "ABC 1.34 1/2/23"; "DE 2.456 Mar-22";"CBD 4/3/2023 5.78"; "XYZ 3.7 Jun-25"; "PQR 4.12 May-21"; "MNO 7.345 Jul-23"))
q)select from tableName where{p:" "vs x;all(all p[0]in .Q.a;not null"F"$p 1;any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]");3=count p)}each lower description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
Explanation for the where clause:
p:" "vs x /split string by whitespace delimiter
all p[0]in .Q.a /checks 1st part are all alphabets
not null"F"$p 1 /cast 2nd part string to float to see it is not null
any p[2]like/:"[a-z][a-z][a-z]-",\("[0-9]";"[0-9]") /check 3rd part matches [a-z][a-z][a-z]-[0-9][0-9] OR [a-z][a-z][a-z]-[0-9]
3=count p /check the split count
EDIT: updated based on @SeanHehir suggestions!
答案4
得分: 0
以下是您要求的内容的中文翻译:
您可以在q中使用有限状态机来实现这个功能(https://code.kx.com/q/basics/regex/#regex-in-q)。但性能可能比这里发布的其他答案差。
m:(0;
{2*all(x?" ")#x in raze .Q`a`A};
{3*all(" "vs x)[1]in .Q.n,"."};
{4*x like"[A-Z][a-z][a-z]-[0-9][0-9]"}
)
f:{4=m/[1;" "vs x]}
如果您事先知道您的列只包含2个空格:
q)从表名选择f每个描述
描述
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
否则,您也可以根据这个条件进行过滤:
q)从表名选择从表名选择2=每个" "=描述,f每个描述
描述
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
英文:
You can implement this using a finite state machine in q also (https://code.kx.com/q/basics/regex/#regex-in-q). But the performance is likely worse than the other answers posted here.
m:(0;
{2*all(x?" ")#x in raze .Q`a`A};
{3*all(" "vs x)[1]in .Q.n,"."};
{4*x like"[A-Z][a-z][a-z]-[0-9][0-9]"}
)
f:{4=m/[1;" "vs x]}
If you know in advance that your column will only contain 2 spaces:
q)select from tableName where f each description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
Otherwise, you can filter on that condition also:
q)select from tableName where 2=sum each" "=description,f each description
description
------------------
"ABC 1.45 Apr-24"
"DE 2.456 Mar-22"
"XYZ 3.7 Jun-25"
"PQR 4.12 May-21"
"MNO 7.345 Jul-23"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论