2023年6月2日 00:12:18go评论68阅读模式

英文:

How do I remove comments from an Oracle SQL script using pyparsing in Python 3.x?

问题

我理解你的问题，你想要解决的是从包含多个SQL语句的Oracle SQL脚本中删除注释，并将脚本拆分成独立的语句。你已经使用了pyparsing库来处理单行注释和多行注释，但遇到了一些问题。以下是你可能想要考虑的一些建议：

解决多行注释问题：
run_tests方法似乎不适用于多行注释。你可以考虑使用parseString方法来测试多行注释，这将允许你更好地控制输入字符串和检查解析的结果。
处理注释和SQL标记的重叠问题：
当注释与SQL标记重叠时，你可以考虑按顺序处理它们，先检查注释，然后再检查SQL标记。例如，可以首先使用单行注释和多行注释的语法来处理注释，然后再尝试解析SQL标记。这样，你可以确保注释不会被错误地解释为SQL标记。
处理有效字符中的 "-" 问题：
你提到 "-" 字符可能会导致问题。为了解决这个问题，你可以在合适的地方使用转义字符来处理 "-"，以确保它不会被错误地解释为注释或语句的一部分。
结构化代码：
对于复杂的问题，良好的代码结构非常重要。你可以将不同的功能模块化，例如一个模块处理注释，另一个模块处理SQL标记。这将使你的代码更易于理解和维护。

最后，确保在处理Oracle SQL脚本时考虑各种边缘情况，以确保你的解决方案具有足够的鲁棒性。如果你需要更多的帮助或示例代码，请随时提出具体的问题。

英文:

I have an oracle sql script containing multiple statements which are separated by ";". I'm trying to remove all comments within this entire script. This includes:

single line comments -- till a newline.
block comments /* ... */
Additionally, I would like to be able to split the entire script into its statements.

I know of the library called sqlparse that can format to remove code, but I came across this edge case:

edge_case = &quot;&quot;&quot;myval := oneval || otherval ||--;
-- single comment line
lpad(nvl(myval, &#39; &#39;), 10, &#39; &#39;);
&quot;&quot;&quot;

formatted = sqlparse.format(edge_case, strip_comments=True)
sqlparse.split(formatted)

# returns [&#39;myval := oneval || otherval ||--;&#39;, &quot;lpad(nvl(myval, &#39; &#39;), 10, &#39; &#39;);&quot;]
# it splits into 2 statements when it should only be 1!

My idea is to focus on each commenting case, and then try to parse the rest of non-commented code by just string tokens which consists of all valid characters in sql.

from pyparsing import *

# define basic elements
semicolon = Literal(&quot;;&quot;)
# all valid chars excluding semicolon
all_valid_chars = alphanums + &quot;_-+*/.,:()[]{}&lt;&gt;=&amp;|&quot;

# define an sql token
sql_token = Word(all_valid_chars)


# need to make sure that &quot;--&quot; is a special case
single_line_comment_token = Literal(&quot;--&quot;)
single_line_comment = single_line_comment_token + SkipTo(line_end) + line_end
test1_single_line_comment = &quot;&quot;&quot;
-- single comment line
--- also a comment
---- also a comment
-- comment -- continues
-- comment /* also continues */

# - not a comment
&quot;&quot;&quot;
single_line_comment.run_tests(test1_single_line_comment)


# also handle multi line comments
multi_line_comment_open = Literal(&quot;/*&quot;)
multi_line_comment_close = Literal(&quot;*/&quot;)
multi_line_comment = multi_line_comment_open + SkipTo(multi_line_comment_close) + multi_line_comment_close

test1_multi_line_comment = &quot;&quot;&quot;
/* multi line comment */
/* outer comment /* inner comment */

/* /* /* still valid */
/* -- still valid */
&quot;&quot;&quot;
multi_line_comment.run_tests(test1_multi_line_comment)

test2_multi_line_comment = &quot;&quot;&quot;
/* multi line comment /* with nested comment should fail! */ */
&quot;&quot;&quot;
multi_line_comment.run_tests(test2_multi_line_comment, failure_tests=True)

I'm now stuck here and not sure where to continue.

statement2 = OneOrMore(single_line_comment | multi_line_comment | sql_token) + semicolon

# run tests
print(statement2.parseString(&quot;myval := oneval || otherval ||--;&quot;))

My issues:

The tests for each of these cases work, but I dont know how I should be testing the together to cover overlaps eg. -- /* this should be a single line comment etc.
I don't know how to handle the rest of the non commented code properly. For example, all valid characters should include -, but this should mess up my single line code?
Less technical issues faced:
Honestly not sure where to even begin or how to structure my code since these cases can overlap. I'm not even sure if pyparsing is what I should be doing to solve this problem, but based on questions i've seen trying to remove comments from sql, parsing is the only real robust solution.
run_tests does not work with multiple lines. This makes it hard to test the block comment although i can use parse_string.

答案1

得分: 1

用于去除注释（或任何pyparsing表达式）的方法是使用transform_string，而不是parse_string。transform_string会扫描输入字符串并应用解析动作和抑制操作。以下是从脚本中删除Python注释的一些代码示例：

import pyparsing as pp

comment = "#" + pp.rest_of_line
# 也可以使用提供的 `python_style_comment`
# comment = pp.python_style_comment

python_source = """
def hello(s):
    # say a nice hello!
    print(f"Hi, {s}!")
    # add a hashtag for social media
    print("#hello")
"""

# 抑制注释并转换字符串
print(comment.suppress().transform_string(python_source))

输出结果：

def hello(s):
    
    print(f"Hi, {s}!")
    
    print("

糟糕，这并没有检测到#hello在引号字符串内。

为了修复这个问题，我们还需要插入一个解析引号字符串的解析器，但这些我们不会抑制：

# 首先解析引号字符串，以防它们包含注释
transform_expr = pp.quoted_string | comment.suppress()
print(transform_expr.transform_string(python_source))

现在输出结果为：

def hello(s):
    
    print(f"Hi, {s}!")
    
    print("#hello")

对于处理SQL注释，您可以做类似的操作：

sql_single_line_comment = '--' + pp.rest_of_line

# 使用pyparsing的C样式 /* ... */ 多行注释定义
sql_multi_line_comment = pp.c_style_comment

comment_remover = (
    # 将引号字符串解析为单独的部分，以避免被抑制
    pp.quoted_string
    | (sql_single_line_comment | sql_multi_line_comment).suppress()
)

sql = "SELECT * FROM STUDENTS; -- watch out for Bobby Tables!"
print(comment_remover.transform_string(sql))

输出结果：

SELECT * FROM STUDENTS;

我很高兴看到您正在使用run_tests！如果您想定义跨多行的测试，而不是使用多行字符串形式，请将测试作为字符串列表传递：

expr.run_tests([test_str1, test_str2, test_str3])

不幸的是，我没有一个调用transform_string的run_tests版本。

将行拆分为单独的语句最好在第二次处理后进行，首先去除注释。您可以尝试类似以下的方法：

semi = pp.Literal(";")
semi.add_parse_action(pp.replace_with(";\n\n"))

然后使用与去除注释相同的转换模式（但这次使用解析动作而不是抑制）。或者使用scan_string来定位分号";"终止符，然后将位于分号之间的SQL写入单独的文件（留作读者的练习）。

英文:

For stripping comments (or any pyparsing expression), you should use transform_string, not parse_string. transform_string scans through the input string and applies parse actions and suppressions. Here is some code to strip out python comments from a script:

import pyparsing as pp

comment = &quot;#&quot; + pp.rest_of_line
# could also use the provided `python_style_comment`
# comment = pp.python_style_comment

python_source = &quot;&quot;&quot;
def hello(s):
    # say a nice hello!
    print(f&quot;Hi, {s}!&quot;)
    # add a hashtag for social media
    print(&quot;#hello&quot;)
&quot;&quot;&quot;

# suppress comments and transform the string
print(comment.suppress().transform_string(python_source))

giving:

def hello(s):
    
    print(f&quot;Hi, {s}!&quot;)
    
    print(&quot;

Oops, this does not detect the fact that #hello is inside a quoted string.

To fix this, we also insert a parser for quoted strings, but these we don't suppress:

# parse quoted strings first, in case they contain a comment
transform_expr = pp.quoted_string | comment.suppress()
print(transform_expr.transform_string(python_source))

Now giving:

def hello(s):
    
    print(f&quot;Hi, {s}!&quot;)
    
    print(&quot;#hello&quot;)

For your SQL comment handling, you'll do much the same:

sql_single_line_comment = &#39;--&#39; + pp.rest_of_line

# use pyparsing&#39;s definition for C-style /* ... */ multiline comments
sql_multi_line_comment = pp.c_style_comment

comment_remover = (
    # parse quoted strings separately so that they don&#39;t get suppressed
    pp.quoted_string
    | (sql_single_line_comment | sql_multi_line_comment).suppress()
)

sql = &quot;SELECT * FROM STUDENTS; -- watch out for Bobby Tables!&quot;
print(comment_remover.transform_string(sql))

prints:

SELECT * FROM STUDENTS;

I'm glad to see you are using run_tests! If you want to define tests that span multiple lines, than instead of using the multiline string form, pass the test as a list of strings:

expr.run_tests([test_str1, test_str2, test_str3])

Unfortunately, I don't have a version of run_tests that calls transform_string.

Splitting the lines up into separate statements is best done in a second pass, after first stripping out the comments. You might try something like:

semi = pp.Literal(&quot;;&quot;)
semi.add_parse_action(pp.replace_with(&quot;;\n\n&quot;)

And then use the same transformation pattern as was used to strip comments (but this time, using the parse action instead of suppressing). Or use scan_string to locate the ";" terminators, and then write out the SQL that lies between semicolons out to their separate files (left as an exercise for the reader).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python 3.x中使用pyparsing从Oracle SQL脚本中删除注释？

问题

答案1

Python : How to split the given start date and end date in a dataframe into number of days falling in each month creating new row for every date split

async在FastAPI中到底是做什么的？

PostgreSQL：将记录插入到主表引用的其他表中。

“Unflatten”列为行

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论