使用正则表达式计算 SQL 查询中的 WHERE 过滤器数量。

huangapple go评论68阅读模式
英文:

Count number of WHERE filters in SQL query using regex

问题

以下是您要的翻译部分:

"Update: I've updated the test string to cover a case that I've missed."
"我已更新测试字符串,以涵盖我遗漏的情况。"

"I'm trying to do count the number of WHERE filters in a query using regex."
"我试图使用正则表达式计算查询中 WHERE 过滤器的数量。"

"So the general idea is to count the number of WHERE and AND occuring in the query, while excluding the AND that happens after a JOIN and before a WHERE. And also excluding the AND that happens in a CASE WHEN clause."
"因此,一般的想法是计算查询中出现的 WHEREAND 的数量,同时排除发生在 JOIN 之后和 WHERE 之前的 AND。还要排除出现在 CASE WHEN 子句中的 AND。"

"For example, this query:"
"例如,此查询:"

"should return 7, which are:"
"应返回 7 个,它们分别是:"

"1. WHERE a>10"
"1. WHERE a>10"

"2. AND b<5"
"2. AND b<5"

"3. WHERE c>10"
"3. WHERE c>10"

"4. AND d<5"
"4. AND d<5"

"5. WHERE c1.a<4"
"5. WHERE c1.a<4"

"6. AND DATE(c1)>'2022-01-01'"
"6. AND DATE(c1)>'2022-01-01'"

"7. AND c2.c>6"
"7. AND c2.c>6"

"The portion AND c1.b = c2.d is not counted because it happens after JOIN, before WHERE."
"部分 AND c1.b = c2.d 不计算在内,因为它发生在 JOIN 之后,而在 WHERE 之前。"

"The portion AND c2.c=1 is not counted because it is in a CASE WHEN clause."
"部分 AND c2.c=1 不计算在内,因为它在 CASE WHEN 子句中。"

"I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period."
"我最终计划在PostgreSQL查询中使用这个来计算在某个时期内发生的所有查询中的过滤器数量。"

"I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!"
"我已经尝试过搜索答案和自己尝试,但未能成功。因此,在这里寻求帮助。提前感谢您!"

英文:

Update: I've updated the test string to cover a case that I've missed.

I'm trying to do count the number of WHERE filters in a query using regex.

So the general idea is to count the number of WHERE and AND occuring in the query, while excluding the AND that happens after a JOIN and before a WHERE. And also excluding the AND that happens in a CASE WHEN clause.

For example, this query:

WITH cte AS (\nSELECT a,b\nFROM something\nWHERE a>10\n AND b<5)\n, cte2 AS (\n SELECT c,\nd FROM another\nWHERE c>10\nAND d<5)\n SELECT CASE WHEN c1.a=1\nAND c2.c=1 THEN 'yes' ELSE 'no' \nEND,c1.a,c1.b,c2.c,c2.d\nFROM cte c1\nINNER JOIN cte2 c2 ON c1.a = c2.c\nAND c1.b = c2.d\nWHERE c1.a<4 AND DATE(c1)>'2022-01-01'\nAND c2.c>6

-- FORMATTED FOR EASE OF READ. PLEASE USE LINE ABOVE AS REGEX TEST STRING
WITH cte AS (
  SELECT a,b 
  FROM something 
  WHERE a>10 
    AND b<5
)

, cte2 AS (
  SELECT c,d
  FROM another
  WHERE c>10
    AND d<5
)

SELECT
  CASE
      WHEN c1.a=1 AND c2.c=1 THEN 'yes'
      WHEN c1.a=1 AND c2.c=1 THEN 'maybe'
      ELSE 'no'
  END,
  c1.a,
  c1.b,
  c2.c,
  c2.d
FROM cte c1
INNER JOIN cte2 c2
   ON c1.a = c2.c
  AND c1.b = c2.d
WHERE c1.a<4
  AND DATE(c1)>'2022-01-01'
  AND c2.c>6

should return 7, which are:

  1. WHERE a>10
  2. AND b<5
  3. WHERE c>10
  4. AND d<5
  5. WHERE c1.a<4
  6. AND DATE(c1)>'2022-01-01'
  7. AND c2.c>6

The portion AND c1.b = c2.d is not counted because it happens after JOIN, before WHERE.

The portion AND c2.c=1 is not counted because it is in a CASE WHEN clause.

I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period.

I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!

答案1

得分: 2

我尝试避免使用回顾,因为它们可能会很混乱,而使用固定宽度的后行断言可能会很痛苦。

我提出的解决方案是在不同的组中捕获所有情况,然后只选择感兴趣的组。不希望的情况仍然会匹配,但不会被选中。

  1. 第一组 - 以JOIN开头(不希望)
  2. 第二组 - 以WHERE开头(希望)
  3. 第三组 - 以CASE开头(不希望)

请注意:可以随时将WHERE|JOIN|CASE|END替换为您希望作为“停止词”的关键字。

所有情况,包括不希望的情况,都将匹配,但您需要仅选择第二组(橙色高亮显示)。

使用正则表达式计算 SQL 查询中的 WHERE 过滤器数量。

英文:

I try to stay away from lookarounds as they could be messy and too painful to use, especially with the fixed-width limitation of lookbehind assertion.

My proposed solution is to capture all scenarios in different groups, and then select only the group of interest. The undesired scenarios will still be matched, but will not be selected.

  1. Group 1 - Starts with JOIN (undesired)
  2. Group 2 - Starts with WHERE (desired)
  3. Group 3 - Starts with CASE (undesired)
(JOIN.*?(?=$|WHERE|JOIN|CASE|END))|(WHERE.*?(?=$|WHERE|JOIN|CASE|END))|(CASE.*?(?=$|WHERE|JOIN|CASE|END))

Note: Feel free to replace WHERE|JOIN|CASE|END to any keyword you want to be the 'stopper' words.
使用正则表达式计算 SQL 查询中的 WHERE 过滤器数量。

All scenarios including the undesired ones will be matched, but you need to select only Group 2 (highlighted in orange).

答案2

得分: 1

你可以尝试像这样1

WITH DataSource (parts) AS 
(  
  SELECT REGEXP_MATCHES(
    ''WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6'',
    E'(?= WHERE)[^)|;]+'
  , 'gmi'
  ) 
)
SELECT SUM
       (
           (length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
            + 1 -- for the where
      )
FROM DataSource

这个想法是匹配WHERE子句之后的文本:

使用正则表达式计算 SQL 查询中的 WHERE 过滤器数量。

然后简单地计算AND的数量,再加一是因为匹配的WHERE子句。

英文:

You can try something like this:

WITH DataSource (parts) AS 
(  
  SELECT REGEXP_MATCHES(
    'WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6',
    E'(?= WHERE)[^)|;]+'
  ,'gmi'
  ) 
)
SELECT SUM
       (
           (length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
            + 1 -- for the where
      )
FROM DataSource

The idea is to match the text after WHERE clause:

使用正则表达式计算 SQL 查询中的 WHERE 过滤器数量。

and then simply count the ANDs and add one because of the matched WHERE.

huangapple
  • 本文由 发表于 2023年2月10日 13:18:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75407240.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定