2023年6月1日 02:15:49go评论102阅读模式

英文:

TheFuzz Library in Python - Ratio Function

问题

我正在尝试理解这个函数中的比率是如何计算的。我已经在互联网上搜索了很多地方，甚至向ChatGPT询问了这个问题，但我找不到一个确切的答案。

问题是，例如：

from thefuzz import fuzz #fuzzywuzzy库是相同的
a = 'house'
b = 'mouse'
print(fuzz.ratio(a, b))

这将返回80，可以通过(max(len(a, b)) - LevDist) / max(len(a, b))来实现
然而，如果我们将b更改为'mousee'，比率将返回73，公式将不再有效。

我还尝试了其他公式，例如在这个DataCamp文章中解释的公式，但它与这种情况和其他情况不一致。

有人能帮助我理解比率是如何计算的吗？谢谢。

英文:

I'm trying to understand how the ratio is calculated in this function. I've been searching all over the internet and even asking ChatGPT about it, but I can't find a single answer.

The issue is that, for example:

from thefuzz import fuzz #fuzzywuzzy library is the same
a = &#39;house&#39;
b = &#39;mouse&#39;
print(fuzz.ratio(a, b))`

This returns 80, which can be achieved by (max(len(a,b)) - LevDist)/ max(len(a,b))
However, if we were to change b = 'mousee', the ratio returns 73 and the formula no longer works.

I've also tried other formulas such as the one explained in this DataCamp Article, but it's not consistent with this case and other.

Can someone help me understand how the ratio is calculated? Thanks

答案1

得分: 1

Fuzzywuzzy 使用 python-Levenshtein（如果可用），否则退而求其次使用 difflib。它们使用不同的算法来确定字符串的相似性：

如果可用，会使用 Levenshtein.ratio。这会计算一个规范化的版本的InDel距离（类似于Levenshtein距离，但不允许替换）。规范化的计算方式为：100 * (1 - (InDel_dist / (len(a)+len(b))))
如果不可用，它会使用 difflib.SequenceMatcher.ratio。

英文:

Fuzzywuzzy uses python-Levenshtein if available and otherwise falls back to difflib. They usw different algorithms to determine a string similarity:

if python-Levenshtein is available Levenshtein.ratio is used. This calculates a normalized Version of the InDel distance (similar to Levenshtein distance, but does not allow substitutions). The normalization is performed as: 100 * (1 - (InDel_dist / (len(a)+len(b))))
if it isn't available it uses difflib.SequenceMatcher.ratio

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

TheFuzz库中的Python – 比例函数

问题

答案1

Elif语句被跳过。

忘记本地 wandb 邮箱

登录到IMDb使用Selenium和Python。

从Python中导入的模块的init.py中记录消息

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。