英文:
Snowflake - aggreagate function: SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*)
问题
以下是您要翻译的内容:
What may be the reason that the following syntax works:
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*);
Output:
|SUM(*) |AVG(*) |MIN(*) |MAX(*) |ANY_VALUE(*)|
|---|---|---|---|---|
|null|null|null|null|null|
Data types:
DESCRIBE RESULT LAST_QUERY_ID();
/*
name type kind
SUM(*) NUMBER(30,0) COLUMN
AVG(*) NUMBER(36,6) COLUMN
MIN(*) VARCHAR(0) COLUMN
MAX(*) VARCHAR(0) COLUMN
ANY_VALUE(*) VARCHAR(0) COLUMN
*/
Based on the query profile it is resolved as:
EXPLAIN USING TABULAR
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*);
[![enter image description here][1]][1]
---
When providing a table/subquery `*` is resolved as the first column:
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*)
FROM (VALUES (1)) sub(c);
/*
SUM(*) AVG(*) MIN(*) MAX(*) ANY_VALUE(*)
1 1 1 1 1
*/
However, it does not work if more than one column exists:
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*)
FROM (VALUES (1,2));
> *Error: too many arguments for function [SUM(VALUES.COLUMN1, VALUES.COLUMN2)]*
Also:
SELECT SUM(*)
FROM (VALUES(1))
HAVING SUM(*) > 1;
> *Use of * as a function argument is only allowed in the SELECT clause.*
---
The documentation of aggregate functions does not provide any clue for `*` in context of [SUM](https://docs.snowflake.com/en/sql-reference/functions/sum), [MIN/MAX](https://docs.snowflake.com/en/sql-reference/functions/min), etc.
I am looking for a valid use case when it may be useful.
---
**EDIT:**
Aggregate functions that accept more than single input:
SELECT LISTAGG(*) FROM (SELECT 'a', 'b');
--error: argument 2 to function LISTAGG needs to be constant, found'"values"."'B'"'
but:
SELECT MIN_BY(*)
FROM (VALUES('a', 'b'), ('aa', 'a')) AS sub(c1, c2);
/*
MIN_BY(*)
aa
*/
---
EDIT 2:
It works not only for aggregate functions but scalar functions too:
SELECT COALESCE(*)
FROM (VALUES (NULL, 'a', 'b'));
-- 'a'
[1]: https://i.stack.imgur.com/zQcX5.png
请注意,代码部分已被排除在外,只提供翻译的文本部分。
英文:
What may be the reason that the following syntax works:
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*);
Output:
SUM(*) | AVG(*) | MIN(*) | MAX(*) | ANY_VALUE(*) |
---|---|---|---|---|
null | null | null | null | null |
Data types:
DESCRIBE RESULT LAST_QUERY_ID();
/*
name type kind
SUM(*) NUMBER(30,0) COLUMN
AVG(*) NUMBER(36,6) COLUMN
MIN(*) VARCHAR(0) COLUMN
MAX(*) VARCHAR(0) COLUMN
ANY_VALUE(*) VARCHAR(0) COLUMN
*/
Based on the query profile it is resolved as:
EXPLAIN USING TABULAR
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*);
When providing a table/subquery *
is resolved as the first column:
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*)
FROM (VALUES (1)) sub(c);
/*
SUM(*) AVG(*) MIN(*) MAX(*) ANY_VALUE(*)
1 1 1 1 1
*/
However, it does not work if more than one column exists:
SELECT SUM(*), AVG(*), MIN(*), MAX(*), ANY_VALUE(*)
FROM (VALUES (1,2));
> Error: too many arguments for function [SUM(VALUES.COLUMN1, VALUES.COLUMN2)]
Also:
SELECT SUM(*)
FROM (VALUES(1))
HAVING SUM(*) > 1;
> Use of * as a function argument is only allowed in the SELECT clause.
The documentation of aggregate functions does not provide any clue for *
in context of SUM, MIN/MAX, etc.
I am looking for a valid use case when it may be useful.
EDIT:
Aggregate functions that accept more than single input:
SELECT LISTAGG(*) FROM (SELECT 'a', 'b');
--error: argument 2 to function LISTAGG needs to be constant, found'"values"."'B'"'
but:
SELECT MIN_BY(*)
FROM (VALUES('a', 'b'), ('aa', 'a')) AS sub(c1, c2);
/*
MIN_BY(*)
aa
*/
EDIT 2:
It works not only for aggregate functions but scalar functions too:
SELECT COALESCE(*)
FROM (VALUES (NULL, 'a', 'b'));
-- 'a'
答案1
得分: 1
我的假设是它是来自count(*)
的一个"魔法工具",实际上是一个奇怪的表达式。
嗯,有时是,有时不是。但是鉴于解析器将看到<inbuilt_built_function(token)><paren><star><paren>
,而count
是一个内置函数,其他函数也是内置的。星号将作为"将所有输入放在这里"的模板。我假设可以通过使用count(distinct *)
来测试,其结果与count(distinct a, b, c)
(对于一个三列表)相同。
然后,将*扩展为"所有的输入"只是自动魔法插入的,然后sum(a, b)
不符合sum的参数计数检查,这不是解析问题,也不是AST问题,而是几乎是运行时问题。
测试:
select
count(*) as c1,
count(distinct *) as c2,
count(distinct column1, column2) as c3
from values
(1,2),
(1,2),
(null, null);
这并没有证明我的想法,但它确实显示了分词器正在生成一个星号标记...
更多思考:
select
count(*) as c1,
count(column1) as c2,
count(column2) as c3,
count(column1, column2) as c4,
--count(distinct *) as c2,
--count(distinct(column1, column2)) as c2,
count(distinct column1, column2) as c3
from values
(1,2),
(1,2),
(null, null);
select
count(*) as c1,
count(column1) as c2,
count(column2) as c3,
count(column1, column2) as c4,
count(distinct column1, column2) as c3
from values
(1,2),
(1,2),
(null, null);
啊,是的,星号表示存在一行,而明确的名称触发了非空值,所以它是参数的扩展。但是有一个参数的整个类型类别,"行",count必须理解,并且SUM和MAX等可以处理,但只是自动变成"必须是一个列"。
这个答案很弱,因为自动魔法就像免费午餐一样"好"。
或者它很"恶心",但是一些大客户有很多SQL"不能更改,需要支持以赚大钱",所以为什么不呢?我不喜欢这些选项,因为它们似乎过于简单并关闭思考。但它们作为选项存在。
魔法行选项:
Medium上有一个非常好的关于PIVOT的解释,它在其中有一个隐式的GROUPING BY ALL
,用于源参数(在结果列或输出列中没有命名的参数,与所选列无关),因此已经有了"只需获取所有给定的东西,并理解它"的概念,以支持这一点。也许在它的核心是像C的ARGC、ARGV或C#的params,它变成了一个桶的东西,动态处理,因此LISTAGG错误,这真的显示了参数的想法。
这真的很有趣!
英文:
my assumption is it is a "magic artifact" from count(*)
which really is a odd expression.
Well it is and it is not. But given the parser will see <inbuilt_built_function(token)><paren><star><paren>
and count
is an inbuilt function, as are the other functions. The * will act as template of "put all the imputs here" I assume. The should be testable with a count(distinct *)
giving the same results as count(distinct a,b,c)
(for a three column table)
So then the expantions of * to "all the input" is just auto magic'ed in, and then sum(a,b)
fails the parameter count check of sum, which is not a parse problem, or an AST problem, but a almost run-time problem.
Test:
select
count(*) as c1,
count(distinct *) as c2,
count(distinct column1, column2) as c3
from values
(1,2),
(1,2),
(null, null);
Well that didn't prove my idea, but it does show the tokenizer is generating a start token...
More out loud thinking:
select
count(*) as c1,
count(column1) as c2,
count(column2) as c3,
count(column1, column2) as c4,
--count(distinct *) as c2,
--count(distinct(column1, column2)) as c2,
count(distinct column1, column2) as c3
from values
(1,2),
(1,2),
(null, null);
select
count(*) as c1,
count(column1) as c2,
count(column2) as c3,
count(column1, column2) as c4,
count(distinct column1, column2) as c3
from values
(1,2),
(1,2),
(null, null);
Ah, yes, the * means a row exists where-as the explicit names, trigger the not null values, so it's a expansions of the parameters. But a whole TYPE class of parameter, "the row" that count must understand, and SUM and MAX etc, can handle, but just auto magic into "must be only one column".
Super weak answer, because auto-magic is like a free lunch, "nicer".
Or it's "disgusting", but some big customer had lots of SQL that "could not be changed, and was needed to be supported to make big $$$ so hey why not?" I don't like these as they seem too simple and shutdown thinking. But they exist as options.
Magic Row option:
There is a really nice explanation of how PIVOT on Medium, and it has an implicit GROUPING BY ALL
in it, for the parameters of the source (that are not named in the result column OR the output columns that is independent of selected columns), and thus there is already a notion of "just take all the stuff you have been given, and make sense of it" that is required to support this. Perhaps in it's like an ARGC, ARGV from C or params from C# where it becomes a bucket of stuff, that is dynamically processed, thus the LISTAGG error, which really does show the parameter idea.
This is so much fun!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论