英文:
Count number of WHERE filters in SQL query using regex
问题
以下是您要的翻译部分:
"Update: I've updated the test string to cover a case that I've missed."
"我已更新测试字符串,以涵盖我遗漏的情况。"
"I'm trying to do count the number of WHERE
filters in a query using regex."
"我试图使用正则表达式计算查询中 WHERE
过滤器的数量。"
"So the general idea is to count the number of WHERE
and AND
occuring in the query, while excluding the AND
that happens after a JOIN
and before a WHERE
. And also excluding the AND
that happens in a CASE WHEN
clause."
"因此,一般的想法是计算查询中出现的 WHERE
和 AND
的数量,同时排除发生在 JOIN
之后和 WHERE
之前的 AND
。还要排除出现在 CASE WHEN
子句中的 AND
。"
"For example, this query:"
"例如,此查询:"
"should return 7, which are:"
"应返回 7 个,它们分别是:"
"1. WHERE a>10
"
"1. WHERE a>10
"
"2. AND b<5
"
"2. AND b<5
"
"3. WHERE c>10
"
"3. WHERE c>10
"
"4. AND d<5
"
"4. AND d<5
"
"5. WHERE c1.a<4
"
"5. WHERE c1.a<4
"
"6. AND DATE(c1)>'2022-01-01'
"
"6. AND DATE(c1)>'2022-01-01'
"
"7. AND c2.c>6
"
"7. AND c2.c>6
"
"The portion AND c1.b = c2.d
is not counted because it happens after JOIN
, before WHERE
."
"部分 AND c1.b = c2.d
不计算在内,因为它发生在 JOIN
之后,而在 WHERE
之前。"
"The portion AND c2.c=1
is not counted because it is in a CASE WHEN
clause."
"部分 AND c2.c=1
不计算在内,因为它在 CASE WHEN
子句中。"
"I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period."
"我最终计划在PostgreSQL查询中使用这个来计算在某个时期内发生的所有查询中的过滤器数量。"
"I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!"
"我已经尝试过搜索答案和自己尝试,但未能成功。因此,在这里寻求帮助。提前感谢您!"
英文:
Update: I've updated the test string to cover a case that I've missed.
I'm trying to do count the number of WHERE
filters in a query using regex.
So the general idea is to count the number of WHERE
and AND
occuring in the query, while excluding the AND
that happens after a JOIN
and before a WHERE
. And also excluding the AND
that happens in a CASE WHEN
clause.
For example, this query:
WITH cte AS (\nSELECT a,b\nFROM something\nWHERE a>10\n AND b<5)\n, cte2 AS (\n SELECT c,\nd FROM another\nWHERE c>10\nAND d<5)\n SELECT CASE WHEN c1.a=1\nAND c2.c=1 THEN 'yes' ELSE 'no' \nEND,c1.a,c1.b,c2.c,c2.d\nFROM cte c1\nINNER JOIN cte2 c2 ON c1.a = c2.c\nAND c1.b = c2.d\nWHERE c1.a<4 AND DATE(c1)>'2022-01-01'\nAND c2.c>6
-- FORMATTED FOR EASE OF READ. PLEASE USE LINE ABOVE AS REGEX TEST STRING
WITH cte AS (
SELECT a,b
FROM something
WHERE a>10
AND b<5
)
, cte2 AS (
SELECT c,d
FROM another
WHERE c>10
AND d<5
)
SELECT
CASE
WHEN c1.a=1 AND c2.c=1 THEN 'yes'
WHEN c1.a=1 AND c2.c=1 THEN 'maybe'
ELSE 'no'
END,
c1.a,
c1.b,
c2.c,
c2.d
FROM cte c1
INNER JOIN cte2 c2
ON c1.a = c2.c
AND c1.b = c2.d
WHERE c1.a<4
AND DATE(c1)>'2022-01-01'
AND c2.c>6
should return 7
, which are:
WHERE a>10
AND b<5
WHERE c>10
AND d<5
WHERE c1.a<4
AND DATE(c1)>'2022-01-01'
AND c2.c>6
The portion AND c1.b = c2.d
is not counted because it happens after JOIN
, before WHERE
.
The portion AND c2.c=1
is not counted because it is in a CASE WHEN
clause.
I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period.
I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!
答案1
得分: 2
我尝试避免使用回顾,因为它们可能会很混乱,而使用固定宽度的后行断言可能会很痛苦。
我提出的解决方案是在不同的组中捕获所有情况,然后只选择感兴趣的组。不希望的情况仍然会匹配,但不会被选中。
- 第一组 - 以
JOIN
开头(不希望) - 第二组 - 以
WHERE
开头(希望) - 第三组 - 以
CASE
开头(不希望)
请注意:可以随时将WHERE|JOIN|CASE|END
替换为您希望作为“停止词”的关键字。
所有情况,包括不希望的情况,都将匹配,但您需要仅选择第二组(橙色高亮显示)。
英文:
I try to stay away from lookarounds as they could be messy and too painful to use, especially with the fixed-width limitation of lookbehind assertion.
My proposed solution is to capture all scenarios in different groups, and then select only the group of interest. The undesired scenarios will still be matched, but will not be selected.
- Group 1 - Starts with
JOIN
(undesired) - Group 2 - Starts with
WHERE
(desired) - Group 3 - Starts with
CASE
(undesired)
(JOIN.*?(?=$|WHERE|JOIN|CASE|END))|(WHERE.*?(?=$|WHERE|JOIN|CASE|END))|(CASE.*?(?=$|WHERE|JOIN|CASE|END))
Note: Feel free to replace WHERE|JOIN|CASE|END
to any keyword you want to be the 'stopper' words.
All scenarios including the undesired ones will be matched, but you need to select only Group 2 (highlighted in orange).
答案2
得分: 1
你可以尝试像这样1:
WITH DataSource (parts) AS
(
SELECT REGEXP_MATCHES(
''WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6'',
E'(?= WHERE)[^)|;]+'
, 'gmi'
)
)
SELECT SUM
(
(length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
+ 1 -- for the where
)
FROM DataSource
这个想法是匹配WHERE
子句之后的文本:
然后简单地计算AND的数量,再加一是因为匹配的WHERE子句。
英文:
You can try something like this:
WITH DataSource (parts) AS
(
SELECT REGEXP_MATCHES(
'WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6',
E'(?= WHERE)[^)|;]+'
,'gmi'
)
)
SELECT SUM
(
(length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
+ 1 -- for the where
)
FROM DataSource
The idea is to match the text after WHERE
clause:
and then simply count the ANDs and add one because of the matched WHERE.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论