2023年7月31日 18:20:25go评论119阅读模式

英文:

F1-Score and Accuracy for Text-Similarity

问题

我正在尝试理解在微调问答模型时如何计算文本之间的F1分数和准确率。

假设我们有以下内容：

labels = [我很好, 他出生于1995年, 埃菲尔铁塔, 狗]

preds = [我很好, 出生于1995年, 埃菲尔, 狗]

在这种情况下，很明显预测结果相当准确，但我如何计算F1分数呢？"dog"和"dogs"并不完全匹配，但它们非常相似。

英文:

I am trying to understand how to calculate F1-Score and accuracy between texts while fine-tuning a QA model.

Let's assume we have this:

labels = [I am fine, He was born in 1995, The Eiffel tower, dogs]

preds = [I am fine, born in 1995, Eiffel, dog]

In this case, it is clear that the predictions are pretty accurate, but how can I measure the F1-Score here? Dog and dogs are not an exact match, but they are very similar.

答案1

得分: 0

一种常用的文本相似度度量标准是Levenshtein距离或编辑距离，它衡量将一个字符串转换为另一个字符串所需的最小单字符编辑（插入、删除或替换）次数。

尝试实现下面的代码。根据您的需求调整threshold。

import Levenshtein
def text_similarity_evaluation(labels, preds, threshold=0.8):
    tp, fp, fn = 0, 0, 0
    for label, pred in zip(labels, preds):
        similarity_score = 1 - Levenshtein.distance(label, pred) / max(len(label), len(pred))
        if similarity_score >= threshold:
            tp += 1
        else:
            fp += 1
    fn = len(labels) - tp
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    f1_score = 2 * (precision * recall) / (precision + recall)
    return precision, recall, f1_score
# 示例用法
labels = ["I am fine", "He was born in 1995", "The Eiffel tower", "dogs"]
preds = ["I am fine", "born in 1995", "Eiffel", "dog"]
precision, recall, f1_score = text_similarity_evaluation(labels, preds, threshold=0.8)
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1_score)

英文:

One popular metric for text similarity is the Levenshtein distance or edit distance, which measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.

Try implementing below code. Adjust threshold as per your requirement.

import Levenshtein
def text_similarity_evaluation(labels, preds, threshold=0.8):
    tp, fp, fn = 0, 0, 0
    for label, pred in zip(labels, preds):
        similarity_score = 1 - Levenshtein.distance(label, pred) / max(len(label), len(pred))
        if similarity_score &gt;= threshold:
            tp += 1
        else:
            fp += 1
    fn = len(labels) - tp
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    f1_score = 2 * (precision * recall) / (precision + recall)
    return precision, recall, f1_score
# Example usage
labels = [&quot;I am fine&quot;, &quot;He was born in 1995&quot;, &quot;The Eiffel tower&quot;, &quot;dogs&quot;]
preds = [&quot;I am fine&quot;, &quot;born in 1995&quot;, &quot;Eiffel&quot;, &quot;dog&quot;]
precision, recall, f1_score = text_similarity_evaluation(labels, preds, threshold=0.8)
print(&quot;Precision:&quot;, precision)
print(&quot;Recall:&quot;, recall)
print(&quot;F1-Score:&quot;, f1_score)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

F1-Score和准确率用于文本相似度。

问题

答案1

在Angular 14中，使用HttpClient库，如何正确地从API中填充我的模型的Date字段？

TypeError: ‘NoneType’ 对象不可调用

关于”Word2Vec”向量化器将文本转换为数值表示的工作方式的查询

卡在从Huggingface下载shards以加载LLM模型。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。