TheFuzz库中的Python – 比例函数

huangapple go评论78阅读模式
英文:

TheFuzz Library in Python - Ratio Function

问题

我正在尝试理解这个函数中的比率是如何计算的。我已经在互联网上搜索了很多地方,甚至向ChatGPT询问了这个问题,但我找不到一个确切的答案。

问题是,例如:

from thefuzz import fuzz #fuzzywuzzy库是相同的

a = 'house'
b = 'mouse'

print(fuzz.ratio(a, b))

这将返回80,可以通过(max(len(a, b)) - LevDist) / max(len(a, b))来实现
然而,如果我们将b更改为'mousee',比率将返回73,公式将不再有效。

我还尝试了其他公式,例如在这个DataCamp文章中解释的公式,但它与这种情况和其他情况不一致。

有人能帮助我理解比率是如何计算的吗?谢谢。

英文:

I'm trying to understand how the ratio is calculated in this function. I've been searching all over the internet and even asking ChatGPT about it, but I can't find a single answer.

The issue is that, for example:

from thefuzz import fuzz #fuzzywuzzy library is the same

a = 'house'
b = 'mouse'

print(fuzz.ratio(a, b))`

This returns 80, which can be achieved by (max(len(a,b)) - LevDist)/ max(len(a,b))
However, if we were to change b = 'mousee', the ratio returns 73 and the formula no longer works.

I've also tried other formulas such as the one explained in this DataCamp Article, but it's not consistent with this case and other.

Can someone help me understand how the ratio is calculated? Thanks

答案1

得分: 1

Fuzzywuzzy 使用 python-Levenshtein(如果可用),否则退而求其次使用 difflib。它们使用不同的算法来确定字符串的相似性:

  1. 如果可用,会使用 Levenshtein.ratio。这会计算一个规范化的版本的InDel距离(类似于Levenshtein距离,但不允许替换)。规范化的计算方式为:100 * (1 - (InDel_dist / (len(a)+len(b))))

  2. 如果不可用,它会使用 difflib.SequenceMatcher.ratio

英文:

Fuzzywuzzy uses python-Levenshtein if available and otherwise falls back to difflib. They usw different algorithms to determine a string similarity:

  1. if python-Levenshtein is available Levenshtein.ratio is used. This calculates a normalized Version of the InDel distance (similar to Levenshtein distance, but does not allow substitutions). The normalization is performed as: 100 * (1 - (InDel_dist / (len(a)+len(b))))

  2. if it isn't available it uses difflib.SequenceMatcher.ratio

huangapple
  • 本文由 发表于 2023年6月1日 02:15:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376293.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定