英文:
Using External Functions in a CTE on Snowflake
问题
在Snowflake的CTE(Common Table Expressions)中调用集成了API的外部函数是否可能?
请参考下面的示例,这是我尝试的内容:
with question_batch as (
select *
from (
values('who are you?')
,('what are you?')
,('how are you?')
) as t(question)
)
select
qb.question
,external_api(qb.question) as response
from question_batch qb;
当尝试像代码中所示调用API时,我收到以下错误消息:
Error: "data" JSON array must contain exact number of rows as the
request. Response batch expected to have 3 rows, but received 1 rows.
我可以在API日志中看到,API成功调用了第一个问题并返回了成功的响应。
英文:
Is it possible to call an external function that is integrated with an API within a CTE (Common Table Expressions) within Snowflake?
Please see the example below for what I'm trying.
with question_batch as (
select *
from (
values('who are you?')
,('what are you?')
,('how are you?')
) as t(question)
)
select
qb.question
,external_api(qb.question) as response
from question_batch qb;
I get the following error when attempting to call the API as shown in the code:
> Error: "data" JSON array must contain exact number of rows as the
> request. Response batch expected to have 3 rows, but received 1 rows.
I can see in the API logs that the API is called successfully for the first question and it returns a successful response.
答案1
得分: 1
你遇到的问题不是由CTE引起的,而是由Snowflake如何利用外部函数引起的。Snowflake通过将来自查询的一批行传递给外部函数来优化性能,而不是逐行调用它们。
对于你的示例代码,你的外部函数收到的请求体将类似于以下内容:
{
"data": [
[0, "你是谁?"],
[1, "你是什么?"],
[2, "你好吗?"]
]
}
请求体是一个简单的JSON对象,其中包含一个名为data
的单一键,该键包含表示使用函数的查询中的一批行(行集)的JSON数组。数组中的每个元素本身都是一个JSON数组,其中包含行索引,后跟函数调用的实际参数。
你的外部函数需要遍历每个行元素,并为每个行单独提供结果。例如,如果你的外部函数进行API调用,它将分别为每个行进行这些调用。
你遇到的错误表明,你的响应仅包含了单个行的结果,而Snowflake期望每个3个行的响应。以下是响应应该如何看起来的示例:
{
"data": [
[0, "Quins"],
[1, "我是一个人类"],
[2, "我很聪明!"]
]
}
在此响应中,每个行再次表示为JSON数组,其中第一个元素是来自请求的对应行的索引,后跟该行的结果。
要获取更详细的信息,你可以参考Snowflake文档:
https://docs.snowflake.com/en/sql-reference/external-functions-data-format
英文:
The issue you're encountering is not caused by the CTE, but rather by how Snowflake utilises external functions. Snowflake optimises performance by calling external functions with batches of rows from your query, instead of calling them row by row.
For your example code, the request body that your external function received would have looked something like this:
{
"data": [
[0, "who are you?"],
[1, "what are you?"],
[2, "how are you?"]
]
}
The request body is a simple JSON object with a single key named data
, which contains a JSON array representing a batch of rows (rowset) from the query that uses the function. Each element in the array is itself a JSON array, with a row index followed by the actual arguments from the function call.
Your external function needs to iterate over each row element and provide a result for each row individually. For instance, if your external function makes API calls, it would make those calls separately for each row.
The error you encountered suggests that your response only included the result for a single row, whereas Snowflake expected a response for each of the 3 rows. Here's an example of how the response should look:
{
"data": [
[0, "Quins"],
[1, "I'm a human"],
[2, "I'm brilliant!"]
]
}
In this response, each row is again represented as a JSON array, where the first element is the index of the corresponding row from the request, followed by the result for that row.
For more detailed information, you can refer to the Snowflake documentation:
https://docs.snowflake.com/en/sql-reference/external-functions-data-format
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论