英文:
Redshift SQL Query - Two data sources with different character: eg François vs Francois
问题
我正在编写一个Redshift SQL查询,从两个包含类似信息的不同表中提取数据。一个表包含带重音、umlauts和ñ等字符的人名,而另一个表则没有。例如,一个表中有François,而另一个表中同一个人的名字是Francois。
我正在寻找一个SQL查询函数,可以显示所有结果,但去除重音和其他类似编码(不确定编码是否是正确的术语)。
我知道如何查找和替换特定值,但这不是我在这里寻找的内容。我想要系统地去除所有重音、umlauts、ñ等字符。我发现在SQL Server中可以使用collate,但看起来在Redshift中collate只处理大小写敏感性。
英文:
I am writing a Redshift SQL Query to pull data from two different tables with similar information. One table includes person names with accents, umlauts, n-yays while the other does not. For example one table has François and the other table for the same person has Francois.
Looking for a sql query function that would show all results without accents and other similar encoding (not sure if encoding is the right terminology here).
I know how to find and replace a specific value but that's not what I'm looking for here. Looking to systematically remove all accents, umlauts, n-yays, etc. I found that collate can be used in SQL Server but looks like collate only deals with case sensitivity in Redshift.
答案1
得分: 1
我不知道是否有专门的功能来处理这个,但可以使用 translate
函数。请参见 Translate
SELECT
source_table.person_name
, translated_table.name_without_accents
FROM source_table
JOIN (
SELECT
person_id
, TRANSLATE(person_name,
'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table
WHERE person_name ~ 'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ'
) AS translated_table ON source_table.person_id = translated_table.person_id;
或者,按照以下方式创建一个视图:
CREATE VIEW all_names AS
SELECT
person_id
, prson_name
, TRANSLATE(person_name,
'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table;
英文:
I'm unaware of any special function for this, but translate
can be used. See Translate
SELECT
source_table.person_name
, translated_table.name_without_accents
FROM source_table
JOIN (
SELECT
person_id
, TRANSLATE(person_name,
'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table
/* regex to locate the names containing accented chars */
WHERE person_name ~ '[áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ]'
) AS translated_table ON source_table.person_id = translated_table.person_id;
Or, create a view along these lines
CREATE VIEW all_names AS
SELECT
person_id
, prson_name
, TRANSLATE(person_name,
'áàäâãåéèëêíìïîóòöôõúùüûçÁÀÄÂÃÅÉÈËÊÍÌÏÎÓÒÖÔÕÚÙÜÛÇ',
'aaaaaaeeeeiiiioooouuuucAAAAAAEEEEIIIIOOOOOUUUUC') AS name_without_accents
FROM source_table
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论