使用正则表达式获取分隔字符串并保留分隔符。

huangapple go评论68阅读模式
英文:

Use Regular Expressions to get delimited strings and keep delimiter

问题

I am trying to parse a SQL statement for column names. I might have a statement that looks like:

SELECT CASE WHEN "Schema Name"."Table Name"."Column Name" = 'Thing' THEN 'Answer'
            WHEN "Table Name"."Second Column Name" = 'Second Thing' THEN 'Second Answer'
            WHEN "Third Column Name" = 'Third Thing' THEN 'Third Answer'
...

From that, my expected matches would be:

  • "Schema Name"."Table Name"."Column Name"
  • "Table Name"."Second Column Name"
  • "Third Column Name"

In other words, I want to find all the strings that begin and end with a quotation mark, but if there's a "." pattern, I want it to skip over that piece. It's the skipping of "." that I am struggling with. I would say it should be followed by a space (" ), but there's a chance someone could fail to put a space next to a symbol ("=, "+).

I have been using <https://regex101.com> for testing, and can get /&quot;(?:[^&quot;.&quot;]*)&quot; to give me each string between quotes, so "Schema Name", "Table Name", and "Column Name". But I'd prefer to get "Schema Name"."Table Name"."Column Name" all as one match, not 3 separate matches.

Not sure it matters, but I'll be wrapping this in python code. I'm basically taking in the SQL statement as a pandas row, and then I want to pull a list of all the schema/table/column names that I'll explode into multiple rows to effectively see which columns get used for a specific view.

英文:

I am trying to parse a SQL statement for column names. I might have a statement that looks like:

SELECT CASE WHEN &quot;Schema Name&quot;.&quot;Table Name&quot;.&quot;Column Name&quot; = &#39;Thing&#39; THEN &#39;Answer&#39;
            WHEN &quot;Table Name&quot;.&quot;Second Column Name&quot; = &#39;Second Thing&#39; THEN &#39;Second Answer&#39;
            WHEN &quot;Third Column Name&quot; = &#39;Third Thing&#39; THEN &#39;Third Answer&#39;
...

From that, my expected matches would be:

  • "Schema Name"."Table Name"."Column Name"
  • "Table Name"."Second Column Name"
  • "Third Column Name"

In other words, I want to find all the strings that begin and end with a quotation mark, but if there's a "." pattern, I want it to skip over that piece. It's the skipping of "." that I am struggling with. I would say it should be followed by a space (" ), but there's a chance someone could fail to put a space next to a symbol ("=, "+).

I have been using <https://regex101.com> for testing, and can get /&quot;(?:[^&quot;.&quot;]*)&quot; to give me each string between quotes, so "Schema Name", "Table Name", and "Column Name". But I'd prefer to get "Schema Name"."Table Name"."Column Name" all as one match, not 3 separate matches.

Not sure it matters, but I'll be wrapping this in python code. I'm basically taking in the SQL statement as a pandas row, and then I want to pull a list of all the schema/table/column names that I'll explode into multiple rows to effectively see which columns get used for a specific view.

答案1

得分: 2

"Match the first quoted string optionally followed by up to 2 additional quoted strings with . separators."

&quot;[^&quot;]+&quot;(?:\.&quot;[^&quot;]+&quot;){0,2}
英文:

Match the first quoted string optionally followed by up to 2 additional quoted strings with . separators.

&quot;[^&quot;]+&quot;(?:\.&quot;[^&quot;]+&quot;){0,2}

huangapple
  • 本文由 发表于 2023年5月22日 23:53:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76307940.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定