英文:
Can a BigQuery UDF have a variable number of arguments (varargs)?
问题
在BigQuery中是否有可能像这样做?
CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_are_null`(val1 ANY TYPE, val2 ANY TYPE, ...) AS (
val1 IS NULL OR val2 IS NULL OR ...
);
换句话说,我想要能够以简洁的方式检查任何列/值是否为null,类似于:
SELECT any_are_null(col1, col2, col3, col4), any_are_null(col1, col2)
而不需要预先知道参数的数量。我想能够像一些其他语言中的 ...
或 *
那样压缩/展开值可能也是一种选项。
英文:
Is there a possible way to do something like this in BigQuery?
CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_are_null`(val1 ANY TYPE, val2 ANY TYPE, ...) AS (
val1 IS NULL OR val2 IS NULL OR ...
);
In other words, I want to be able to check if any columns/values are null in a concise way, something like:
SELECT any_are_null(col1, col2, col3, col4), any_are_null(col1, col2)
Without knowing the number of arguments beforehand. I suppose being able to zip/unzip the values would be one option as well (such as ...
or *
in some languages).
答案1
得分: 6
> *我希望能以简洁的方式检查是否存在任何空列/空值*
可以通过使用内置函数[GREATEST](https://cloud.google.com/bigquery/docs/reference/standard-sql/mathematical_functions#GREATEST)来实现:
> **GREATEST(X1,...,XN)**
>
> 返回X1,...,XN中的最大值。**如果任何参数为NULL,则返回NULL。**
然后`any_are_null`实际上相当于:
```sql
SELECT GREATEST(col1, col2, col3, col4)
FROM tab;
或者:
SELECT GREATEST(col1, col2, col3, col4) IS NULL
FROM tab;
SELECT
CASE WHEN GREATEST(col1, col2, col3, col4) IS NULL THEN '至少有一个NULL值'
ELSE '未检测到任何NULL'
END
FROM tab;
英文:
> I want to be able to check if any columns/values are null in a concise way
It is achievable by using built-in GREATEST:
> GREATEST(X1,...,XN)
>
> Returns the greatest value among X1,...,XN. If any argument is NULL, returns NULL.
Then any_are_null
is effectively equivalent of:
SELECT GREATEST(col1, col2, col3, col4)
FROM tab;
or:
SELECT GREATEST(col1, col2, col3, col4) IS NULL
FROM tab;
SELECT
CASE WHEN GREATEST(col1, col2, col3, col4) IS NULL THEN 'At least one NULL value'
ELSE 'No NULLs detected'
END
FROM tab;
答案2
得分: 3
看起来你已经回答了自己的问题 - BigQuery不支持UDF(用户定义函数)的可变参数长度(也称为vargs / varargs)。
以下是我尝试使用数组的方法(实际上在vargs的语法糖下也会使用数组)。与你的方法非常相似,但有一些小的调整,例如更狭窄的类型(并且不情愿地使用了一个临时函数,以便在Dekart BigQuery Playground中运行):
CREATE TEMPORARY FUNCTION hasNullValue(arr ARRAY<INT64>)
RETURNS BOOLEAN
AS (
EXISTS (
SELECT value FROM UNNEST(arr) AS value WHERE value IS NULL
)
);
SELECT hasNullValue([1, 2, NULL, 3]) AS contains_null;
英文:
Looks like you've answered your own question - BigQuery doesn't provide variable argument lengths for UDFs (a.k.a. vargs / varargs).
Below is my attempt using an array instead (which would be used under the hood of varargs syntactic sugar anyway). Very similar to yours but with minor tweaks such as narrower typing (and reluctantly using a temporary function so it could be run in the Dekart BigQuery Playground):
CREATE TEMPORARY FUNCTION hasNullValue(arr ARRAY<INT64>)
RETURNS BOOLEAN
AS (
EXISTS (
SELECT value FROM UNNEST(arr) AS value WHERE value IS NULL
)
);
SELECT hasNullValue([1, 2, NULL, 3]) AS contains_null;
答案3
得分: 1
如果您能将值列表修改为一个数组(当然可以是可变长度的),然后可以使用类似以下方式使其工作:
CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_val_is_null`(arr ANY TYPE) AS (
(SELECT EXISTS(SELECT 1 FROM UNNEST(arr) elem WHERE elem IS NULL LIMIT 1))
);
select variant_test.any_val_is_null([1,2,null,3]) as val union all
select variant_test.any_val_is_null([1,2]) union all
select variant_test.any_val_is_null([1,2,3,4,5,6,7,8])
除了将值合并到数组中(或者结构体或JSON,但这将更加困难),目前在BigQuery中没有办法传递可变长度的参数给UDF。
英文:
If you are able to modify the values list into an array (which can be variable-length of course) then you can use something like the following to get it to work:
CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_val_is_null`(arr ANY TYPE) AS (
(SELECT EXISTS(SELECT 1 FROM UNNEST(arr) elem WHERE elem IS NULL LIMIT 1))
);
select variant_test.any_val_is_null([1,2,null,3]) as val union all
select variant_test.any_val_is_null([1,2]) union all
select variant_test.any_val_is_null([1,2,3,4,5,6,7,8])
Beyond consolidating values into an array (or struct or json, but it'd be more difficult), there's no way currently to pass variable-length arguments to a UDF in BigQuery.
答案4
得分: 0
你可以使用 ARRAY_AGG
和 ARRAY_LENGTH
函数的组合。AnyAreNull
函数使用 ARRAY_AGG
函数来聚合数组 arr
中非空的值。然后它将聚合后的数组长度与原始数组的长度进行比较。如果长度不同,这意味着原始数组中至少有一个空值。
CREATE TEMP FUNCTION AnyAreNull(arr ANY TYPE) AS (
(SELECT ARRAY_LENGTH(ARRAY_AGG(val)) < ARRAY_LENGTH(arr)
FROM UNNEST(arr) AS val)
);
SELECT
AnyAreNull([col1, col2, col3, col4]) AS any_null1,
AnyAreNull([col1, col2]) AS any_null2
FROM your_table;
我尝试创建了一个名为 AnyAreNull
的临时函数,它接受一个数组作为输入,如果数组中有任何一个值为 NULL
,则返回 TRUE
,否则返回 FALSE
。
在 SELECT
语句中,我对不同的列数组([col1, col2, col3, col4]
和 [col1, col2]
)调用了 AnyAreNull
函数以检查空值。
在这个上下文中,我使用了 UNNEST
函数将输入的数组转换为一个表,以便可以应用 COUNTIF
函数来检查是否存在 NULL
值。
英文:
You can use a combination of the ARRAY_AGG
and ARRAY_LENGTH
functions. The AnyAreNull function uses the ARRAY_AGG function to aggregate the non-null values in the array arr. It then compares the length of the aggregated array with the length of the original array. If the lengths are different, it means there was at least one NULL value in the original array.
CREATE TEMP FUNCTION AnyAreNull(arr ANY TYPE) AS (
(SELECT ARRAY_LENGTH(ARRAY_AGG(val)) < ARRAY_LENGTH(arr)
FROM UNNEST(arr) AS val)
);
SELECT
AnyAreNull([col1, col2, col3, col4]) AS any_null1,
AnyAreNull([col1, col2]) AS any_null2
FROM your_table;
I tried to create a temporary function called AnyAreNull
, which takes an array as input and returns TRUE
if any of the values in the array are NULL
, and FALSE
otherwise.
Inside the SELECT
statement, I called the AnyAreNull
function on different arrays of columns ([col1, col2, col3, col4]
and [col1, col2]
) to check for null values.
In this context, I used the UNNEST
function to convert the input arrays into a table, allowing me to apply the COUNTIF
function to check for NULL
values.
.
答案5
得分: -1
你可以使用ARRAY_TO_STRING函数,并指定分隔符和null_text(https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions#array_to_string),然后使用LIKE运算符 (https://cloud.google.com/bigquery/docs/reference/standard-sql/operators#like_operator) 来检查是否有两个相邻的分隔符,如果是的话,则表示数组中存在null值。
英文:
You can use ARRAY_TO_STRING with delimiter and null_text(https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions#array_to_string), next use LIKE operator (https://cloud.google.com/bigquery/docs/reference/standard-sql/operators#like_operator) to check if have two delimiter adjacent then we have null value in array
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论