正则表达式来查找源和目标

huangapple go评论75阅读模式
英文:

Regular expression to find source and target

问题

需要帮忙编写通用正则表达式以查找以下情景中的源表:

示例1输出:

xyz.test

示例2输出:无输出,因为sample.Test是目标表而不是源表。

我尝试过下面的正则表达式 \b(?!DELETE\s+FROM\s+)(?:FROM|JOIN)\s+(\w+./*[^\s]+),它对第一个情景起作用,但对第二个情景预期没有输出,然而我得到了sample.test。

英文:

Need help to write generic regular expression for to find the source table from the below scenarios:

Sample 1:

INSERT INTO a.test
(
test1,
test2
)
select 
test1,
test2
from 
xyz.test

Output 1:

xyz.test 

Sample 2:

*******************************************************                                                                                
    MACRO                                                                 
        abc.test
    PURPOSE                                   
         TO DELETE THE DATA FROM abc.test
    INPUT PARAMS                              
                   NONE                                                                      
    CALLED BY                                                                            
        THIS MACRO IS BEING CALLED FROM sql
*******************************************************/ 
    DELETE FROM sample.test

No output for this as sample.Test is target table not source.

I have tried below regex \b(?!DELETE\s+FROM\s+)(?:FROM|JOIN)\s+(\w+./*[^\s]+) it's working for 1st scenario but for 2nd scenario I was expecting no output however I am getting sample.test

答案1

得分: 2

I really don't think that a regular expression is the safe way to go.
There are really too many things to handle, such as comments, unions, etc.

I had a play with the
SQLGlot library with this code:

##python3 -m pip install sqlglot

from sqlglot import parse_one, exp

queries = [
INSERT INTO a.test
(
test1,
test2
)
select 
test1,
test2
from 
xyz.test

Source tables:
['xyz.test']



/*******************************************************                                                        

    MACRO
        abc.test
    PURPOSE
         TO DELETE THE DATA FROM abc.test
    INPUT PARAMS
                   NONE
    CALLED BY
        THIS MACRO IS BEING CALLED FROM sql
*******************************************************/
    DELETE FROM sample.test

Source tables:
[]



SELECT
  t1.firstname,
  t1.lastname,
  t2.email,
  t3.order
FROM abc.table1 t1
LEFT JOIN abc.table2 t2 ON t2.id = t1.id
LEFT JOIN abc.table3 t3 ON t3.email = t2.email

Source tables:
['abc.table1 AS t1', 'abc.table2 AS t2', 'abc.table3 AS t3']



SELECT id, name FROM table1
UNION
SELECT id, name FROM table2

Source tables:
['table1', 'table2']
英文:

I really don't think that a regular expression is the safe way to go.
There are really too many things to handle, such as comments, unions, etc.

I had a play with the
SQLGlot library with this code:

##python3 -m pip install sqlglot

from sqlglot import parse_one, exp

queries = [
"""
INSERT INTO a.test
(
test1,
test2
)
select 
test1,
test2
from 
xyz.test
""",
"""
/*******************************************************                                                                                
    MACRO                                                                 
        abc.test
    PURPOSE                                   
         TO DELETE THE DATA FROM abc.test
    INPUT PARAMS                              
                   NONE                                                                      
    CALLED BY                                                                            
        THIS MACRO IS BEING CALLED FROM sql
*******************************************************/ 
    DELETE FROM sample.test
""",
"""
SELECT
  t1.firstname,
  t1.lastname,
  t2.email,
  t3.order
FROM abc.table1 t1
LEFT JOIN abc.table2 t2 ON t2.id = t1.id
LEFT JOIN abc.table3 t3 ON t3.email = t2.email
""",
"""
SELECT id, name FROM table1
UNION
SELECT id, name FROM table2
"""
]

for query in queries:
    print(query)
    sourceTables = []
    for select in parse_one(query).find_all(exp.Select):
        for table in select.find_all(exp.Table):
            sourceTables.append(str(table))
    print("Source tables:")
    print(sourceTables)
    print("\n")

It outputs the following:

INSERT INTO a.test
(
test1,
test2
)
select 
test1,
test2
from 
xyz.test

Source tables:
['xyz.test']



/*******************************************************                                                        

    MACRO
        abc.test
    PURPOSE
         TO DELETE THE DATA FROM abc.test
    INPUT PARAMS
                   NONE
    CALLED BY
        THIS MACRO IS BEING CALLED FROM sql
*******************************************************/
    DELETE FROM sample.test

Source tables:
[]



SELECT
  t1.firstname,
  t1.lastname,
  t2.email,
  t3.order
FROM abc.table1 t1
LEFT JOIN abc.table2 t2 ON t2.id = t1.id
LEFT JOIN abc.table3 t3 ON t3.email = t2.email

Source tables:
['abc.table1 AS t1', 'abc.table2 AS t2', 'abc.table3 AS t3']



SELECT id, name FROM table1
UNION
SELECT id, name FROM table2

Source tables:
['table1', 'table2']

Seems to do what you wanted...

答案2

得分: 0

以下将捕获来自所提供示例的值。

注意,要求“delete”,“from”或“join”是行上第一个非空白字符序列。

(?is)^\s*(?:delete\s+)?(?:from|join)\s*.*?([^\s]+)

(?is) 将启用_忽略大小写_ 和 单行 模式。

输出

xyz.test
sample.test
英文:

The following will capture the values from both of the provided examples.

As a note, a requirement is that the "delete", "from", or "join" are the first non-white-space character sequences on the line.

(?is)^\s*(?:delete\s+)?(?:from|join)\s*.*?([^\s]+)

The (?is) will toggle-on the ignore case, and single-line modes.

Output

xyz.test
sample.test

huangapple
  • 本文由 发表于 2023年6月1日 19:01:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381207.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定