英文:
TheFuzz Library in Python - Ratio Function
问题
我正在尝试理解这个函数中的比率是如何计算的。我已经在互联网上搜索了很多地方,甚至向ChatGPT询问了这个问题,但我找不到一个确切的答案。
问题是,例如:
from thefuzz import fuzz #fuzzywuzzy库是相同的
a = 'house'
b = 'mouse'
print(fuzz.ratio(a, b))
这将返回80,可以通过(max(len(a, b)) - LevDist) / max(len(a, b))来实现
然而,如果我们将b更改为'mousee',比率将返回73,公式将不再有效。
我还尝试了其他公式,例如在这个DataCamp文章中解释的公式,但它与这种情况和其他情况不一致。
有人能帮助我理解比率是如何计算的吗?谢谢。
英文:
I'm trying to understand how the ratio is calculated in this function. I've been searching all over the internet and even asking ChatGPT about it, but I can't find a single answer.
The issue is that, for example:
from thefuzz import fuzz #fuzzywuzzy library is the same
a = 'house'
b = 'mouse'
print(fuzz.ratio(a, b))`
This returns 80, which can be achieved by (max(len(a,b)) - LevDist)/ max(len(a,b))
However, if we were to change b = 'mousee', the ratio returns 73 and the formula no longer works.
I've also tried other formulas such as the one explained in this DataCamp Article, but it's not consistent with this case and other.
Can someone help me understand how the ratio is calculated? Thanks
答案1
得分: 1
Fuzzywuzzy 使用 python-Levenshtein
(如果可用),否则退而求其次使用 difflib
。它们使用不同的算法来确定字符串的相似性:
-
如果可用,会使用
Levenshtein.ratio
。这会计算一个规范化的版本的InDel距离(类似于Levenshtein距离,但不允许替换)。规范化的计算方式为:100 * (1 - (InDel_dist / (len(a)+len(b))))
-
如果不可用,它会使用
difflib.SequenceMatcher.ratio
。
英文:
Fuzzywuzzy uses python-Levenshtein
if available and otherwise falls back to difflib
. They usw different algorithms to determine a string similarity:
-
if
python-Levenshtein
is availableLevenshtein.ratio
is used. This calculates a normalized Version of the InDel distance (similar to Levenshtein distance, but does not allow substitutions). The normalization is performed as:100 * (1 - (InDel_dist / (len(a)+len(b))))
-
if it isn't available it uses
difflib.SequenceMatcher.ratio
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论