英文:
Use Regular Expressions to get delimited strings and keep delimiter
问题
I am trying to parse a SQL statement for column names. I might have a statement that looks like:
SELECT CASE WHEN "Schema Name"."Table Name"."Column Name" = 'Thing' THEN 'Answer'
WHEN "Table Name"."Second Column Name" = 'Second Thing' THEN 'Second Answer'
WHEN "Third Column Name" = 'Third Thing' THEN 'Third Answer'
...
From that, my expected matches would be:
- "Schema Name"."Table Name"."Column Name"
- "Table Name"."Second Column Name"
- "Third Column Name"
In other words, I want to find all the strings that begin and end with a quotation mark, but if there's a "." pattern, I want it to skip over that piece. It's the skipping of "." that I am struggling with. I would say it should be followed by a space (" ), but there's a chance someone could fail to put a space next to a symbol ("=, "+).
I have been using <https://regex101.com> for testing, and can get /"(?:[^"."]*)"
to give me each string between quotes, so "Schema Name", "Table Name", and "Column Name". But I'd prefer to get "Schema Name"."Table Name"."Column Name" all as one match, not 3 separate matches.
Not sure it matters, but I'll be wrapping this in python code. I'm basically taking in the SQL statement as a pandas row, and then I want to pull a list of all the schema/table/column names that I'll explode into multiple rows to effectively see which columns get used for a specific view.
英文:
I am trying to parse a SQL statement for column names. I might have a statement that looks like:
SELECT CASE WHEN "Schema Name"."Table Name"."Column Name" = 'Thing' THEN 'Answer'
WHEN "Table Name"."Second Column Name" = 'Second Thing' THEN 'Second Answer'
WHEN "Third Column Name" = 'Third Thing' THEN 'Third Answer'
...
From that, my expected matches would be:
- "Schema Name"."Table Name"."Column Name"
- "Table Name"."Second Column Name"
- "Third Column Name"
In other words, I want to find all the strings that begin and end with a quotation mark, but if there's a "." pattern, I want it to skip over that piece. It's the skipping of "." that I am struggling with. I would say it should be followed by a space (" ), but there's a chance someone could fail to put a space next to a symbol ("=, "+).
I have been using <https://regex101.com> for testing, and can get /"(?:[^"."]*)"
to give me each string between quotes, so "Schema Name", "Table Name", and "Column Name". But I'd prefer to get "Schema Name"."Table Name"."Column Name" all as one match, not 3 separate matches.
Not sure it matters, but I'll be wrapping this in python code. I'm basically taking in the SQL statement as a pandas row, and then I want to pull a list of all the schema/table/column names that I'll explode into multiple rows to effectively see which columns get used for a specific view.
答案1
得分: 2
"Match the first quoted string optionally followed by up to 2 additional quoted strings with .
separators."
"[^"]+"(?:\."[^"]+"){0,2}
英文:
Match the first quoted string optionally followed by up to 2 additional quoted strings with .
separators.
"[^"]+"(?:\."[^"]+"){0,2}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论