BigQuery UDF是否可以具有可变数量的参数(varargs)?

huangapple go评论65阅读模式
英文:

Can a BigQuery UDF have a variable number of arguments (varargs)?

问题

在BigQuery中是否有可能像这样做?

CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_are_null`(val1 ANY TYPE, val2 ANY TYPE, ...) AS (
  val1 IS NULL OR val2 IS NULL OR ...
);

换句话说,我想要能够以简洁的方式检查任何列/值是否为null,类似于:

SELECT any_are_null(col1, col2, col3, col4), any_are_null(col1, col2)

而不需要预先知道参数的数量。我想能够像一些其他语言中的 ...* 那样压缩/展开值可能也是一种选项。

英文:

Is there a possible way to do something like this in BigQuery?

CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_are_null`(val1 ANY TYPE, val2 ANY TYPE, ...) AS (
  val1 IS NULL OR val2 IS NULL OR ...
);

In other words, I want to be able to check if any columns/values are null in a concise way, something like:

SELECT any_are_null(col1, col2, col3, col4), any_are_null(col1, col2)

Without knowing the number of arguments beforehand. I suppose being able to zip/unzip the values would be one option as well (such as ... or * in some languages).

答案1

得分: 6

> *我希望能以简洁的方式检查是否存在任何空列/空值*

可以通过使用内置函数[GREATEST](https://cloud.google.com/bigquery/docs/reference/standard-sql/mathematical_functions#GREATEST)来实现:

> **GREATEST(X1,...,XN)**
>
> 返回X1,...,XN中的最大值。**如果任何参数为NULL,则返回NULL。**

然后`any_are_null`实际上相当于:

```sql
SELECT GREATEST(col1, col2, col3, col4)
FROM tab;

或者:

SELECT GREATEST(col1, col2, col3, col4) IS NULL
FROM tab;
SELECT 
   CASE WHEN GREATEST(col1, col2, col3, col4) IS NULL THEN '至少有一个NULL值'
        ELSE '未检测到任何NULL'
   END
FROM tab;
英文:

> I want to be able to check if any columns/values are null in a concise way

It is achievable by using built-in GREATEST:

> GREATEST(X1,...,XN)
>
> Returns the greatest value among X1,...,XN. If any argument is NULL, returns NULL.

Then any_are_null is effectively equivalent of:

SELECT GREATEST(col1, col2, col3, col4)
FROM tab;

or:

SELECT GREATEST(col1, col2, col3, col4) IS NULL
FROM tab;

SELECT 
   CASE WHEN GREATEST(col1, col2, col3, col4) IS NULL THEN 'At least one NULL value'
        ELSE 'No NULLs detected'
   END
FROM tab;

答案2

得分: 3

看起来你已经回答了自己的问题 - BigQuery不支持UDF(用户定义函数)的可变参数长度(也称为vargs / varargs)。

以下是我尝试使用数组的方法(实际上在vargs的语法糖下也会使用数组)。与你的方法非常相似,但有一些小的调整,例如更狭窄的类型(并且不情愿地使用了一个临时函数,以便在Dekart BigQuery Playground中运行):

CREATE TEMPORARY FUNCTION hasNullValue(arr ARRAY<INT64>)
RETURNS BOOLEAN
AS (
  EXISTS (
    SELECT value FROM UNNEST(arr) AS value WHERE value IS NULL
  )
);
SELECT hasNullValue([1, 2, NULL, 3]) AS contains_null;
英文:

Looks like you've answered your own question - BigQuery doesn't provide variable argument lengths for UDFs (a.k.a. vargs / varargs).

Below is my attempt using an array instead (which would be used under the hood of varargs syntactic sugar anyway). Very similar to yours but with minor tweaks such as narrower typing (and reluctantly using a temporary function so it could be run in the Dekart BigQuery Playground):

CREATE TEMPORARY FUNCTION hasNullValue(arr ARRAY&lt;INT64&gt;)
RETURNS BOOLEAN
AS (
  EXISTS (
    SELECT value FROM UNNEST(arr) AS value WHERE value IS NULL
  )
);
SELECT hasNullValue([1, 2, NULL, 3]) AS contains_null;

答案3

得分: 1

如果您能将值列表修改为一个数组(当然可以是可变长度的),然后可以使用类似以下方式使其工作:

CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_val_is_null`(arr ANY TYPE) AS (
  (SELECT EXISTS(SELECT 1 FROM UNNEST(arr) elem WHERE elem IS NULL LIMIT 1))
);

select variant_test.any_val_is_null([1,2,null,3]) as val union all
select variant_test.any_val_is_null([1,2]) union all
select variant_test.any_val_is_null([1,2,3,4,5,6,7,8])

除了将值合并到数组中(或者结构体或JSON,但这将更加困难),目前在BigQuery中没有办法传递可变长度的参数给UDF。

英文:

If you are able to modify the values list into an array (which can be variable-length of course) then you can use something like the following to get it to work:

CREATE OR REPLACE FUNCTION `mydataset.variant_test.any_val_is_null`(arr ANY TYPE) AS (
  (SELECT EXISTS(SELECT 1 FROM UNNEST(arr) elem WHERE elem IS NULL LIMIT 1))
);

select variant_test.any_val_is_null([1,2,null,3]) as val union all
select variant_test.any_val_is_null([1,2]) union all
select variant_test.any_val_is_null([1,2,3,4,5,6,7,8])

Beyond consolidating values into an array (or struct or json, but it'd be more difficult), there's no way currently to pass variable-length arguments to a UDF in BigQuery.

BigQuery UDF是否可以具有可变数量的参数(varargs)?

答案4

得分: 0

你可以使用 ARRAY_AGGARRAY_LENGTH 函数的组合。AnyAreNull 函数使用 ARRAY_AGG 函数来聚合数组 arr 中非空的值。然后它将聚合后的数组长度与原始数组的长度进行比较。如果长度不同,这意味着原始数组中至少有一个空值。

CREATE TEMP FUNCTION AnyAreNull(arr ANY TYPE) AS (
  (SELECT ARRAY_LENGTH(ARRAY_AGG(val)) < ARRAY_LENGTH(arr) 
   FROM UNNEST(arr) AS val)
);

SELECT 
  AnyAreNull([col1, col2, col3, col4]) AS any_null1,
  AnyAreNull([col1, col2]) AS any_null2
FROM your_table;

我尝试创建了一个名为 AnyAreNull 的临时函数,它接受一个数组作为输入,如果数组中有任何一个值为 NULL,则返回 TRUE,否则返回 FALSE

SELECT 语句中,我对不同的列数组([col1, col2, col3, col4][col1, col2])调用了 AnyAreNull 函数以检查空值。

在这个上下文中,我使用了 UNNEST 函数将输入的数组转换为一个表,以便可以应用 COUNTIF 函数来检查是否存在 NULL 值。

英文:

You can use a combination of the ARRAY_AGG and ARRAY_LENGTH functions. The AnyAreNull function uses the ARRAY_AGG function to aggregate the non-null values in the array arr. It then compares the length of the aggregated array with the length of the original array. If the lengths are different, it means there was at least one NULL value in the original array.

    CREATE TEMP FUNCTION AnyAreNull(arr ANY TYPE) AS (
      (SELECT ARRAY_LENGTH(ARRAY_AGG(val)) &lt; ARRAY_LENGTH(arr) 
       FROM UNNEST(arr) AS val)
    );

SELECT 
  AnyAreNull([col1, col2, col3, col4]) AS any_null1,
  AnyAreNull([col1, col2]) AS any_null2
FROM your_table;

I tried to create a temporary function called AnyAreNull, which takes an array as input and returns TRUE if any of the values in the array are NULL, and FALSE otherwise.

Inside the SELECT statement, I called the AnyAreNull function on different arrays of columns ([col1, col2, col3, col4] and [col1, col2]) to check for null values.

In this context, I used the UNNEST function to convert the input arrays into a table, allowing me to apply the COUNTIF function to check for NULL values.
.

答案5

得分: -1

你可以使用ARRAY_TO_STRING函数,并指定分隔符和null_text(https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions#array_to_string),然后使用LIKE运算符 (https://cloud.google.com/bigquery/docs/reference/standard-sql/operators#like_operator) 来检查是否有两个相邻的分隔符,如果是的话,则表示数组中存在null值。

英文:

You can use ARRAY_TO_STRING with delimiter and null_text(https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions#array_to_string), next use LIKE operator (https://cloud.google.com/bigquery/docs/reference/standard-sql/operators#like_operator) to check if have two delimiter adjacent then we have null value in array

huangapple
  • 本文由 发表于 2023年6月22日 08:20:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76527906.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定