在Snowflake上使用通用表达式(CTE)中的外部函数。

huangapple go评论51阅读模式
英文:

Using External Functions in a CTE on Snowflake

问题

在Snowflake的CTE(Common Table Expressions)中调用集成了API的外部函数是否可能?

请参考下面的示例,这是我尝试的内容:

with question_batch as (
    select *
    from (
        values('who are you?')
            ,('what are you?')
            ,('how are you?')
        ) as t(question)
    )
select 
    qb.question
    ,external_api(qb.question) as response
from question_batch qb;

当尝试像代码中所示调用API时,我收到以下错误消息:

Error: "data" JSON array must contain exact number of rows as the
request. Response batch expected to have 3 rows, but received 1 rows.

我可以在API日志中看到,API成功调用了第一个问题并返回了成功的响应。

英文:

Is it possible to call an external function that is integrated with an API within a CTE (Common Table Expressions) within Snowflake?

Please see the example below for what I'm trying.

with question_batch as (
    select *
    from (
        values('who are you?')
            ,('what are you?')
            ,('how are you?')
        ) as t(question)
    )
select 
    qb.question
    ,external_api(qb.question) as response
from question_batch qb;

I get the following error when attempting to call the API as shown in the code:

> Error: "data" JSON array must contain exact number of rows as the
> request. Response batch expected to have 3 rows, but received 1 rows.

I can see in the API logs that the API is called successfully for the first question and it returns a successful response.

答案1

得分: 1

你遇到的问题不是由CTE引起的,而是由Snowflake如何利用外部函数引起的。Snowflake通过将来自查询的一批行传递给外部函数来优化性能,而不是逐行调用它们。

对于你的示例代码,你的外部函数收到的请求体将类似于以下内容:

{
  "data": [
    [0, "你是谁?"],
    [1, "你是什么?"],
    [2, "你好吗?"]
  ]
}

请求体是一个简单的JSON对象,其中包含一个名为data的单一键,该键包含表示使用函数的查询中的一批行(行集)的JSON数组。数组中的每个元素本身都是一个JSON数组,其中包含行索引,后跟函数调用的实际参数。

你的外部函数需要遍历每个行元素,并为每个行单独提供结果。例如,如果你的外部函数进行API调用,它将分别为每个行进行这些调用。

你遇到的错误表明,你的响应仅包含了单个行的结果,而Snowflake期望每个3个行的响应。以下是响应应该如何看起来的示例:

{
  "data": [
    [0, "Quins"],
    [1, "我是一个人类"],
    [2, "我很聪明!"]
  ]
}

在此响应中,每个行再次表示为JSON数组,其中第一个元素是来自请求的对应行的索引,后跟该行的结果。

要获取更详细的信息,你可以参考Snowflake文档:

https://docs.snowflake.com/en/sql-reference/external-functions-data-format

英文:

The issue you're encountering is not caused by the CTE, but rather by how Snowflake utilises external functions. Snowflake optimises performance by calling external functions with batches of rows from your query, instead of calling them row by row.

For your example code, the request body that your external function received would have looked something like this:

{
  "data": [
    [0, "who are you?"],
    [1, "what are you?"],
    [2, "how are you?"]
  ]
}

The request body is a simple JSON object with a single key named data, which contains a JSON array representing a batch of rows (rowset) from the query that uses the function. Each element in the array is itself a JSON array, with a row index followed by the actual arguments from the function call.

Your external function needs to iterate over each row element and provide a result for each row individually. For instance, if your external function makes API calls, it would make those calls separately for each row.

The error you encountered suggests that your response only included the result for a single row, whereas Snowflake expected a response for each of the 3 rows. Here's an example of how the response should look:

{
  "data": [
    [0, "Quins"],
    [1, "I'm a human"],
    [2, "I'm brilliant!"]
  ]
}

In this response, each row is again represented as a JSON array, where the first element is the index of the corresponding row from the request, followed by the result for that row.

For more detailed information, you can refer to the Snowflake documentation:

https://docs.snowflake.com/en/sql-reference/external-functions-data-format

huangapple
  • 本文由 发表于 2023年6月22日 12:40:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76528648.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定