在SQL中查找特定模式的字符串。

huangapple go评论82阅读模式
英文:

Find Specific Pattern in String via SQL

问题

我有一个我们称之为“备注”的列。基本上是注释。
我想看看是否可以从这些备注中提取精确的字母或数字模式,看示例。

 REMARKS                           
 '这是魔法钥匙24HMBC123456,别丢了ex644' 
 '代码31hbxx123456经过测试良好px543' 
 'rt445已经测试61CGRW123456为良好jx163'

对于上面的列,我要测试并提取2个数字后面跟着4个字母,然后是6个数字的模式。但我不知道如何编写代码来只获取:

    24HMBC123456
    31hbxx123456
    61CGRW123456 

任何帮助都将不胜感激。

我考虑使用REGEXP_INSTR与一些子字符串,但这似乎要么过于复杂,要么确实很复杂。

英文:

I have a column of what we'll call remarks. Basically notes.
I want to see if I can extract exact patterns of alpha or numbers from these notes see example.

 REMARKS                           
 'This is the magic key 24HMBC123456 dont lose it ex644' 
 'Code 31hbxx123456 is tested good px543' 
 'rt445 has tested 61CGRW123456 as good jx163'

for the above column I'd test and extract for a pattern of 2 numbers followed by 4 alphas followed by 6 numbers. but I dont know how to code that to get just:

    24HMBC123456
    31hbxx123456
    61CGRW123456 

Any help is appreciated.

I thought to use a case when REGEXP_INSTR mixed with some substrings but this appears either Im over complicating it or its really that complicated.

答案1

得分: 1

要筛选列,可以使用

SELECT 
   keywords = REGEXP_SUBSTR(column_name, ''[0-9]{2}[aA-zZ]{4}[0-9]{6}')
FROM 
   table_name
WHERE 
   NULLIF( REGEXP_SUBSTR(column_name, ''[0-9]{2}[aA-zZ]{4}[0-9]{6}'), ''' )
   IS NOT NULL

注意

我的 SQL 有点 '生疏' - 但请自行尝试。

正则表达式应该根据此 测试器 工作。

编辑 1:

如果你的搜索模式可以在一行中出现多次,
这个脚本可能有所帮助!

SELECT 
   REGEXP_SUBSTR(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}'',1,LEVEL) 
FROM 
   table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}')

要进行更多统计,你可以扩展为:

SELECT 
   primKey AS RowID,
   REGEXP_SUBSTR(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}'',1,LEVEL) AS Pattern
FROM 
   table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,''[0-9]{2}[aA-zZ]{4}[0-9]{6}')

编辑 2:

基于 Fred 的评论,我替换了 [aA-zZ] 的正则表达式项。

SELECT 
   REGEXP_SUBSTR(column_name,''[0-9]{2}[A-Za-z]{4}[0-9]{6}'',1,LEVEL) 
FROM 
   table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,''[0-9]{2}[A-Za-z]{4}[0-9]{6}')
英文:

To filter your column, you can use

SELECT 
   keywords = REGEXP_SUBSTR(column_name, '[0-9]{2}[aA-zZ]{4}[0-9]{6}') 
FROM 
   table_name
WHERE 
   NULLIF( REGEXP_SUBSTR(column_name, '[0-9]{2}[aA-zZ]{4}[0-9]{6}') , '' ) 
   IS NOT NULL

NOTE

My SQL is a little 'rusty' - but try for your self.

The Reg-Ex should work according to this Tester

EDIT 1:

If your search pattern can exists multiple times per row,
this script can help!

SELECT 
   REGEXP_SUBSTR(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}',1,LEVEL) 
FROM 
   table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}')

To do some more statistics, you could expand to:

SELECT 
   primKey AS RowID,
   REGEXP_SUBSTR(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}',1,LEVEL) AS Pattern
FROM 
   table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,'[0-9]{2}[aA-zZ]{4}[0-9]{6}')

EDIT 2:

Based on Fred's comment, i replaced the [aA-zZ] RegEx-Term.

SELECT 
   REGEXP_SUBSTR(column_name,'[0-9]{2}[A-Za-z]{4}[0-9]{6}',1,LEVEL) 
FROM 
   table_name
CONNECT BY LEVEL <=REGEXP_COUNT(column_name,'[0-9]{2}[A-Za-z]{4}[0-9]{6}')

huangapple
  • 本文由 发表于 2023年6月22日 05:40:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76527330.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定