SQL函数DIFFERENCE返回有趣的分数。

huangapple go评论64阅读模式
英文:

SQL Function DIFFERENCE returns interesting scores

问题

可以有人解释为什么SQL函数(在SQL Server 2019中使用)返回对我来说似乎是违反直觉的结果吗?以下是查询和分数:

SELECT 
    DIFFERENCE('Good', 'Good Samaritans'); --结果为4(高得分匹配)

SELECT 
    DIFFERENCE('Samaritans', 'Good Samaritans'); --结果为1(低得分匹配)

SELECT 
    DIFFERENCE('Sam', 'Good Samaritans'); --结果为2(比上面的得分高!)

我理解DIFFERENCE使用SOUNDEX来以发音方式匹配辅音,但上述结果似乎非常奇怪,特别是第二个查询。这与空格和后续字符串有关吗?

英文:

Can anyone explain why the SQL function (using SQL Server 2019) returns results that to me appear to be counter intuitive? Here are the queries and the scores:

SELECT 
    DIFFERENCE('Good', 'Good Samaritans'); --Result is 4 (High score match)

SELECT 
    DIFFERENCE('Samaritans', 'Good Samaritans'); --Result is 1 (Low score match)

SELECT 
    DIFFERENCE('Sam', 'Good Samaritans'); --Result is 2 (A higher score than above!)

I understand DIFFERENCE uses SOUNDEX to match consonants phonetically, but the results above seem very odd particularly with the second query. Is it something to do with the space and the proceeding string?

答案1

得分: 4

如果更改单词顺序,您会看到第一个词存在偏向。然后,如果考虑SOUNDEX,您将开始理解原因。另请阅读下面的参考。

SOUNDEX将包含字母数字字符串转换为基于该字符串在英语中发音时的方式的四字符代码。代码的第一个字符是character_expression的第一个字符,转换为大写。代码的第二到第四个字符是表示表达式中的字母的数字。除非它们是字符串的第一个字母,否则将忽略字母A、E、I、O、U、H、W和Y。如果需要,在末尾添加零以产生一个四字符代码。有关SOUNDEX代码的更多信息,请参阅Soundex索引系统

fiddle

difference的期望可能过高。

英文:

If you change the order of words you will see there is a bias on the first word. Then if you consider SOUNDEX you will begin to understand why. Also read the reference below.

> SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken in English. The
> first character of the code is the first character of
> character_expression, converted to upper case. The second through
> fourth characters of the code are numbers that represent the letters
> in the expression. The letters A, E, I, O, U, H, W, and Y are ignored
> unless they are the first letter of the string. Zeroes are added at
> the end if necessary to produce a four-character code. For more
> information about the SOUNDEX code, see The Soundex Indexing
> System

-- Bias based on left to right order of words
SELECT 10 id, DIFFERENCE('Good', 'Good Samaritans') --Result is 4 (High score match)
union all
SELECT 11 id, DIFFERENCE('Good', 'Samaritans Good') --Result is 1 (Low score match)
union all
SELECT 20, DIFFERENCE('Samaritans', 'Good Samaritans') --Result is 1 (Low score match)
union all
SELECT 21, DIFFERENCE('Samaritans', 'Samaritans Good') --Result is 4 (High score match)
union all
SELECT 30, DIFFERENCE('Sam', 'Good Samaritans') --Result is 2 (On the upper low side)

id (No column name)
10 4
11 1
20 1
21 4
30 2

fiddle

Your expectations from difference may be too high.

huangapple
  • 本文由 发表于 2023年6月2日 14:16:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387583.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定