英文:
What are other ways of formatting the SQL in a JavaScript stored procedure in Snowflake?
问题
我在Snowflake中创建了一个存储过程,允许我识别给定表中的重复数据行并停止后续代码的运行。
此存储过程的参数包括数据库、模式、表以及自然键。因为在一个表中查找重复行可能需要多个自然键在ROW_NUMBER()函数中使用。
CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(DB_NAME STRING, SCHEMA_NAME STRING, TABLE_NAME STRING, NATURAL_KEYS STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
// 将自然键拆分为数组并去除空格
var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });
// 将自然键格式化为逗号分隔的字符串以供查询使用
var NaturalkeyString = keysArray.join(', ');
// 组装SQL查询以识别重复行
var sql_query =
'WITH duplicates AS (' +
'SELECT ' + NaturalkeyString +
', ROW_NUMBER() OVER (PARTITION BY ' + NaturalkeyString + ' ORDER BY ' + NaturalkeyString + ') AS RowNumber ' +
'FROM "' + DB_NAME + '"."' + SCHEMA_NAME + '"."' + TABLE_NAME + '"' +
') ' +
'SELECT * FROM duplicates WHERE RowNumber > 1';
var stmt = snowflake.createStatement({sqlText: sql_query});
var result = stmt.execute();
// 检查是否存在重复行
if(result.next()){
throw 'ERROR: 在表 ' + TABLE_NAME + ' 中找到重复行';
}
else{
return '在表 ' + TABLE_NAME + ' 中未找到重复行';
}
$$;
-- 测试存储过程:
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID, EMP_NAME, ORG');
-- 如果存储过程 "FIND_DUPLICATE_ROWS" 正常工作,调用 "FIND_DUPLICATE_ROWS" 存储过程后,以下代码不应执行:
SELECT 7887;
我的问题是,我不喜欢如何进行SQL拼接以使其工作,是否有更清晰的方法可以做到这一点?
英文:
I created a stored procedure in Snowflake that lets me identify duplicate rows of data in a given table and stop the subsequent code from running.
The parameters for this stored procedure are the Database, Schema, Table, and one or more Natural keys from the respective table since finding duplicate rows in one table may require more than 1 natural key in the ROW_NUMBER() function.
CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(DB_NAME STRING, SCHEMA_NAME STRING, TABLE_NAME STRING, NATURAL_KEYS STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
// Split the Natural keys into an array and trim whitespaces
var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });
// Format the Natural keys into comma separated string for the query
var NaturalkeyString = keysArray.join(', ');
// Assemble the SQL query to identify duplicate rows
var sql_query =
'WITH duplicates AS (' +
'SELECT ' + NaturalkeyString +
', ROW_NUMBER() OVER (PARTITION BY ' + NaturalkeyString + ' ORDER BY ' + NaturalkeyString + ') AS RowNumber ' +
'FROM "' + DB_NAME + '"."' + SCHEMA_NAME + '"."' + TABLE_NAME + '"' +
') ' +
'SELECT * FROM duplicates WHERE RowNumber > 1';
var stmt = snowflake.createStatement({sqlText: sql_query});
var result = stmt.execute();
// Check if there are duplicate rows
if(result.next()){
throw 'ERROR: Duplicate rows found in the table ' + TABLE_NAME;
}
else{
return 'No duplicate rows found in the table ' + TABLE_NAME;
}
$$;
-- Test the Stored Procedure:
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');
--If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;
My issue is that I don't like how I have to concatenate the SQL to make it work, is there a cleaner way to do this?
答案1
得分: 0
I found my mistake, instead of backticks (`) I was using single quotes ('), it was just hard for me to see. Then by using JavaScript Template Literals with the syntax:
`String Text ${Expression} string text`
I was able to format my stored procedure appropriately as shown below:
CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(
DB_NAME STRING,
SCHEMA_NAME STRING,
TABLE_NAME STRING,
NATURAL_KEYS STRING
)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
// Split the Natural keys into an array and trim whitespaces
var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });
// Format the Natural keys into a comma-separated string for the query
var NaturalKeyString = keysArray.join(', ');
// Assemble the SQL query to identify duplicate rows
var sql_query = `
SELECT
${NaturalKeyString},
ROW_NUMBER() OVER(PARTITION BY ${NaturalKeyString} ORDER BY ${NaturalKeyString}) AS RowNumber
FROM ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}
QUALIFY RowNumber > 1;`;
var stmt = snowflake.createStatement({sqlText: sql_query});
var result = stmt.execute();
error_message = `"ERROR: Duplicate rows found in ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}"`;
// Check if there are duplicate rows
if(result.next()){
throw error_message;
}
else{
return 0;
}
$$;
-- SUPA_DB.RAW.DIM_EMP_DETAILS
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');
-- If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;
英文:
I found my mistake, instead of backticks ( ` ) I was using single quotes ( ' ) it was just hard for me to see. Then by using JavaScript Template Literals with the syntax:
`String Text ${Expression} string text`
I was able to format my stored procedure appropriately as shown below:
CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(
DB_NAME STRING,
SCHEMA_NAME STRING,
TABLE_NAME STRING,
NATURAL_KEYS STRING
)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
// Split the Natural keys into an array and trim whitespaces
var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });
// Format the Natural keys into comma separated string for the query
var NaturalKeyString = keysArray.join(', ');
// Assemble the SQL query to identify duplicate rows
var sql_query = `
SELECT
${NaturalKeyString},
ROW_NUMBER() OVER(PARTITION BY ${NaturalKeyString} ORDER BY ${NaturalKeyString}) AS RowNumber
FROM ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}
QUALIFY RowNumber > 1;`;
var stmt = snowflake.createStatement({sqlText: sql_query});
var result = stmt.execute();
error_message = `"ERROR: Duplicate rows found in ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}"`;
// Check if there are duplicate rows
if(result.next()){
throw error_message;
}
else{
return 0;
}
$$;
-- SUPA_DB.RAW.DIM_EMP_DETAILS
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');
--If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论