英文:
Possible to introspect collation on a string in BigQuery?
问题
以下是翻译好的部分:
假设我有以下内容:
SELECT COLLATE("hello", "und:ci")
如果以后要使用"hello"
这个字符串值,例如在比较中使用,或者存储到稍后要访问的表中,我想检查它的排序规则。是否有一种方法可以从值本身进行这样的操作(而不是检查INFORMATION_SCHEMA
)?例如,我可以使用FORMAT("%T", value)
来检查类型并获取所有这些信息(尽管可能很痛苦),是否有一种方法可以输出一个字符串值,以查看它是否有任何与之绑定的排序规则(无论是从表、列、函数等继承的)?
另一个例子,如果我创建一个JavaScript UDF 并传递一个字符串,是否有一种方法来确定(或恢复其排序规则),以便与其他内容进行比较?例如调用:
SELECT equalsInJsUDF(COLLATE("hello", "und:ci"), "HELLO")
对比:
SELECT equalsInJsUDF("hello", "HELLO")
英文:
Let's say I have the following:
SELECT COLLATE("hello", "und:ci")
If this string value of "hello"
is ever used later, for example in a comparison -- or stored to a table that is later accessed, I'd like to inspect its collation. Is there any way to do this from the value-itself (not checking INFORMATION_SCHEMA
)? For example, I can use the FORMAT("%T", value)
to inspect the type and get all that information (albeit painfully), is there any way in which I can output a string-value to see whether it has any collation that is bound to it (either inherited from the table, column, function, etc.)?
As another example, if I create a Javascript UDF and pass it a string, is there a way to determine (or recover its collation) how it should compare to others? For example calling:
SELECT equalsInJsUDF(COLLATE("hello", "und:ci"), "HELLO")
vs.
SELECT equalsInJsUDF("hello", "HELLO")
答案1
得分: 1
根据BigQuery排序规则文档,您无法在用户定义的函数(UDFs)中使用排序规则:
用户定义的函数(UDFs)不能使用排序规则参数。
CREATE FUNCTION tmp_dataset.my_udf(x STRING) AS (x); SELECT tmp_dataset.my_udf(col_ci) FROM shared_dataset.table_collation_simple; -- 用户错误: -- "不允许在参数 x ('und:ci') 上使用排序规则。 -- 使用 COLLATE(arg, '') 以在 [1:8] 移除排序规则"```
我想一个解决方法可能是使用谓词并查看 UPPER(stringVal) = LOWER(stringVal)
。但是,这仅解决了当前问题中排序规则是否影响字符串的相等性,并且类似以下的情况不会指示是否存在排序规则:
SELECT
UPPER(val)=LOWER(val) AS acts_like_ci_collation
FROM
UNNEST(['123']) as val
英文:
You cannot use collation on a UDF according to BigQuery Collation docs:
> User-defined functions (UDFs) can't take collated arguments.
>
>CREATE FUNCTION tmp_dataset.my_udf(x STRING) AS (x);
> SELECT tmp_dataset.my_udf(col_ci)
> FROM shared_dataset.table_collation_simple;
> -- User error:
> -- "Collation is not allowed on argument x ('und:ci').
> -- Use COLLATE(arg, '') to remove collation at [1:8]"
I suppose a workaround could be to use a predicate and see if UPPER(stringVal) = LOWER(stringVal)
. However, that only addresses whether that collation affects equality the current string in question, and something like the following wouldn't indicate whether it has a collation or not:
SELECT
UPPER(val)=LOWER(val) AS acts_like_ci_collation
FROM
UNNEST(['123']) as val
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论