英文:
F1-Score and Accuracy for Text-Similarity
问题
我正在尝试理解在微调问答模型时如何计算文本之间的F1分数和准确率。
假设我们有以下内容:
labels = [我很好, 他出生于1995年, 埃菲尔铁塔, 狗]
preds = [我很好, 出生于1995年, 埃菲尔, 狗]
在这种情况下,很明显预测结果相当准确,但我如何计算F1分数呢?"dog"和"dogs"并不完全匹配,但它们非常相似。
英文:
I am trying to understand how to calculate F1-Score and accuracy between texts while fine-tuning a QA model.
Let's assume we have this:
labels = [I am fine, He was born in 1995, The Eiffel tower, dogs]
preds = [I am fine, born in 1995, Eiffel, dog]
In this case, it is clear that the predictions are pretty accurate, but how can I measure the F1-Score here? Dog and dogs are not an exact match, but they are very similar.
答案1
得分: 0
一种常用的文本相似度度量标准是Levenshtein距离或编辑距离,它衡量将一个字符串转换为另一个字符串所需的最小单字符编辑(插入、删除或替换)次数。
尝试实现下面的代码。根据您的需求调整threshold
。
import Levenshtein
def text_similarity_evaluation(labels, preds, threshold=0.8):
tp, fp, fn = 0, 0, 0
for label, pred in zip(labels, preds):
similarity_score = 1 - Levenshtein.distance(label, pred) / max(len(label), len(pred))
if similarity_score >= threshold:
tp += 1
else:
fp += 1
fn = len(labels) - tp
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1_score = 2 * (precision * recall) / (precision + recall)
return precision, recall, f1_score
# 示例用法
labels = ["I am fine", "He was born in 1995", "The Eiffel tower", "dogs"]
preds = ["I am fine", "born in 1995", "Eiffel", "dog"]
precision, recall, f1_score = text_similarity_evaluation(labels, preds, threshold=0.8)
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1_score)
英文:
One popular metric for text similarity is the Levenshtein distance or edit distance, which measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.
Try implementing below code. Adjust threshold
as per your requirement.
import Levenshtein
def text_similarity_evaluation(labels, preds, threshold=0.8):
tp, fp, fn = 0, 0, 0
for label, pred in zip(labels, preds):
similarity_score = 1 - Levenshtein.distance(label, pred) / max(len(label), len(pred))
if similarity_score >= threshold:
tp += 1
else:
fp += 1
fn = len(labels) - tp
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1_score = 2 * (precision * recall) / (precision + recall)
return precision, recall, f1_score
# Example usage
labels = ["I am fine", "He was born in 1995", "The Eiffel tower", "dogs"]
preds = ["I am fine", "born in 1995", "Eiffel", "dog"]
precision, recall, f1_score = text_similarity_evaluation(labels, preds, threshold=0.8)
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1_score)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论