英文:
Find Specific Pattern in String via SQL
问题
我有一个我们称之为“备注”的列。基本上是注释。
我想看看是否可以从这些备注中提取精确的字母或数字模式,看示例。
REMARKS
'这是魔法钥匙24HMBC123456,别丢了ex644'
'代码31hbxx123456经过测试良好px543'
'rt445已经测试61CGRW123456为良好jx163'
对于上面的列,我要测试并提取2个数字后面跟着4个字母,然后是6个数字的模式。但我不知道如何编写代码来只获取:
24HMBC123456
31hbxx123456
61CGRW123456
任何帮助都将不胜感激。
我考虑使用REGEXP_INSTR与一些子字符串,但这似乎要么过于复杂,要么确实很复杂。
英文:
I have a column of what we'll call remarks. Basically notes.
I want to see if I can extract exact patterns of alpha or numbers from these notes see example.
REMARKS
'This is the magic key 24HMBC123456 dont lose it ex644'
'Code 31hbxx123456 is tested good px543'
'rt445 has tested 61CGRW123456 as good jx163'
for the above column I'd test and extract for a pattern of 2 numbers followed by 4 alphas followed by 6 numbers. but I dont know how to code that to get just:
24HMBC123456
31hbxx123456
61CGRW123456
Any help is appreciated.
I thought to use a case when REGEXP_INSTR mixed with some substrings but this appears either Im over complicating it or its really that complicated.
答案1
得分: 1
要筛选列,可以使用
SELECT
keywords = REGEXP_SUBSTR(column_name, ''[0-9]{2}[aA-zZ]{4}[0-9]{6}')
FROM
table_name
WHERE
NULLIF( REGEXP_SUBSTR(column_name, ''[0-9]{2}[aA-zZ]{4}[0-9]{6}'), ''' )
IS NOT NULL
注意
我的 SQL 有点 '生疏' - 但请自行尝试。
正则表达式应该根据此 测试器 工作。
编辑 1:
如果你的搜索模式可以在一行中出现多次,
这个脚本可能有所帮助!
SELECT
REGEXP_SUBSTR(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}'',1,LEVEL)
FROM
table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}')
要进行更多统计,你可以扩展为:
SELECT
primKey AS RowID,
REGEXP_SUBSTR(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}'',1,LEVEL) AS Pattern
FROM
table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}')
编辑 2:
基于 Fred 的评论,我替换了 [aA-zZ]
的正则表达式项。
SELECT
REGEXP_SUBSTR(column_name,''[0-9]{2}[A-Za-z]{4}[0-9]{6}'',1,LEVEL)
FROM
table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,''[0-9]{2}[A-Za-z]{4}[0-9]{6}')
英文:
To filter your column, you can use
SELECT
keywords = REGEXP_SUBSTR(column_name, '[0-9]{2}[aA-zZ]{4}[0-9]{6}')
FROM
table_name
WHERE
NULLIF( REGEXP_SUBSTR(column_name, '[0-9]{2}[aA-zZ]{4}[0-9]{6}') , '' )
IS NOT NULL
NOTE
My SQL is a little 'rusty' - but try for your self.
The Reg-Ex should work according to this Tester
EDIT 1:
If your search pattern can exists multiple times per row,
this script can help!
SELECT
REGEXP_SUBSTR(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}',1,LEVEL)
FROM
table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}')
To do some more statistics, you could expand to:
SELECT
primKey AS RowID,
REGEXP_SUBSTR(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}',1,LEVEL) AS Pattern
FROM
table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}')
EDIT 2:
Based on Fred's comment, i replaced the [aA-zZ]
RegEx-Term.
SELECT
REGEXP_SUBSTR(column_name,'[0-9]{2}[A-Za-z]{4}[0-9]{6}',1,LEVEL)
FROM
table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,'[0-9]{2}[A-Za-z]{4}[0-9]{6}')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论