BigQuery中是否可以检查字符串的排序方式?

huangapple go评论59阅读模式
英文:

Possible to introspect collation on a string in BigQuery?

问题

以下是翻译好的部分:

假设我有以下内容:

SELECT COLLATE("hello", "und:ci")

如果以后要使用"hello"这个字符串值,例如在比较中使用,或者存储到稍后要访问的表中,我想检查它的排序规则。是否有一种方法可以从值本身进行这样的操作(而不是检查INFORMATION_SCHEMA)?例如,我可以使用FORMAT("%T", value)来检查类型并获取所有这些信息(尽管可能很痛苦),是否有一种方法可以输出一个字符串值,以查看它是否有任何与之绑定的排序规则(无论是从表、列、函数等继承的)?

另一个例子,如果我创建一个JavaScript UDF 并传递一个字符串,是否有一种方法来确定(或恢复其排序规则),以便与其他内容进行比较?例如调用:

SELECT equalsInJsUDF(COLLATE("hello", "und:ci"), "HELLO")

对比:

SELECT equalsInJsUDF("hello", "HELLO")

英文:

Let's say I have the following:

SELECT COLLATE("hello", "und:ci")

If this string value of "hello" is ever used later, for example in a comparison -- or stored to a table that is later accessed, I'd like to inspect its collation. Is there any way to do this from the value-itself (not checking INFORMATION_SCHEMA)? For example, I can use the FORMAT("%T", value) to inspect the type and get all that information (albeit painfully), is there any way in which I can output a string-value to see whether it has any collation that is bound to it (either inherited from the table, column, function, etc.)?


As another example, if I create a Javascript UDF and pass it a string, is there a way to determine (or recover its collation) how it should compare to others? For example calling:

SELECT equalsInJsUDF(COLLATE("hello", "und:ci"), "HELLO")

vs.

SELECT equalsInJsUDF("hello", "HELLO")

答案1

得分: 1

根据BigQuery排序规则文档,您无法在用户定义的函数(UDFs)中使用排序规则:

用户定义的函数(UDFs)不能使用排序规则参数。

CREATE FUNCTION tmp_dataset.my_udf(x STRING) AS (x);
SELECT tmp_dataset.my_udf(col_ci)
FROM shared_dataset.table_collation_simple;
-- 用户错误:
-- "不允许在参数 x ('und:ci') 上使用排序规则。
-- 使用 COLLATE(arg, '') 以在 [1:8] 移除排序规则"```

我想一个解决方法可能是使用谓词并查看 UPPER(stringVal) = LOWER(stringVal)。但是,这仅解决了当前问题中排序规则是否影响字符串的相等性,并且类似以下的情况不会指示是否存在排序规则:

SELECT 
  UPPER(val)=LOWER(val) AS acts_like_ci_collation
FROM
  UNNEST(['123']) as val
英文:

You cannot use collation on a UDF according to BigQuery Collation docs:

> User-defined functions (UDFs) can't take collated arguments.
>
>CREATE FUNCTION tmp_dataset.my_udf(x STRING) AS (x);
> SELECT tmp_dataset.my_udf(col_ci)
> FROM shared_dataset.table_collation_simple;
> -- User error:
> -- "Collation is not allowed on argument x ('und:ci').
> -- Use COLLATE(arg, '') to remove collation at [1:8]"

I suppose a workaround could be to use a predicate and see if UPPER(stringVal) = LOWER(stringVal). However, that only addresses whether that collation affects equality the current string in question, and something like the following wouldn't indicate whether it has a collation or not:

SELECT 
  UPPER(val)=LOWER(val) AS acts_like_ci_collation
FROM
  UNNEST(['123']) as val

huangapple
  • 本文由 发表于 2023年5月26日 09:42:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337168.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定