英文:
How can i get the total precision/recall/F1 score from the confusion matrix?
问题
我有一个使用sklearn计算的SVC模型的混淆矩阵,如下所示:
分类报告:
精确率 召回率 F1得分 支持数
0 0.7975000 0.5907407 0.6787234 540
1 0.6316667 0.8239130 0.7150943 460
准确率 0.6980000 1000
宏平均值 0.7145833 0.7073269 0.6969089 1000
加权平均值 0.7212167 0.6980000 0.6954540 1000
我该如何仅从矩阵中的值计算整体精确率/召回率和F1得分?也就是说,不导入sklearn中的函数,只从混淆矩阵本身进行计算。
英文:
I have the following confusion matrix for a SVC model compute with sklearn:
Classification report:
precision recall f1-score support
0 0.7975000 0.5907407 0.6787234 540
1 0.6316667 0.8239130 0.7150943 460
accuracy 0.6980000 1000
macro avg 0.7145833 0.7073269 0.6969089 1000
weighted avg 0.7212167 0.6980000 0.6954540 1000
How can I compute the overall precision/recall and f1-score starting only from the value in the matrix? So without importing the function from sklearn and just computing "by hand" form the confusion matrix itself?
答案1
得分: 1
如果您已经有混淆矩阵,下面的示例展示了如何计算所需的指标。假设混淆矩阵的列/行表示标签/预测。
在第一个循环中,对于每个标签(列),它找到样本数、真正例和假反例。在第二个循环中,对于每一行(预测),它求和假正例。然后,您可以组合指标来获得灵敏度/召回率、精确度、F1 分数等。
最后,使用sklearn.metrics.classification_report()
进行最终检查。如果您将自己的数据通过此函数运行,请确保与sklearn
报告的结果进行确认,以确保结果符合预期。
import pandas as pd
import numpy as np
#合成的三类数据和预测结果
y_true = ([0]*(11+4+7)
+ [1]*(3+16+2)
+ [2]*(1+5+23))
y_pred = ([0]*11 + [1]*4 + [2]*7
+ [0]*3 + [1]*16 + [2]*2
+ [0]*1 + [1]*5 + [2]*23)
#假设我们有混淆矩阵
confusion_matrix = np.array([[11, 3, 1],
[4, 16, 5],
[7, 2, 23]])
#为了更方便地查看标签和预测结果,创建一个数据框
confusion_df = pd.DataFrame(
confusion_matrix,
columns=['is_orange', 'is_apple', 'is_pear'],
index=['predicted_orange', 'predicted_apple', 'predicted_pear']
)
metrics = {} #用于记录每个类别的指标
n_classes = confusion_matrix.shape[0]
for label_idx in range(n_classes):
metrics[label_idx] = {
'tp': confusion_matrix[label_idx, label_idx],
'fn': sum( [confusion_matrix[pred_idx, label_idx]
for pred_idx in range(n_classes)
if pred_idx != label_idx] ),
'n_samples': confusion_matrix[:, label_idx].sum()
}
for pred_idx in range(n_classes):
metrics[pred_idx].update({
'fp': sum( [confusion_matrix[pred_idx, label_idx]
for label_idx in range(n_classes)
if label_idx != pred_idx] )
})
for cls, cnts in metrics.items():
metrics[cls].update({
'precision': cnts['tp'] / (cnts['tp'] + cnts['fp']),
'recall': cnts['tp'] / (cnts['tp'] + cnts['fn']),
'f1-score': cnts['tp'] / ( cnts['tp'] + 0.5*(cnts['fp'] + cnts['fn']))
})
print(metrics) #查看每个类别的计算指标
#确认宏平均得分
# sklearn.metrics.classification_report()
from sklearn.metrics import classification_report
cr = classification_report(y_true, y_pred)
print(cr)
以上是计算每个类别的指标并打印结果的示例代码。
英文:
If you already have the confusion matrix, the example below shows how to add things up for the required metrics. It assumes that the columns/rows of the confusion matrix represent the label/predictions.
In the first loop, for each label (column) it finds the number of samples, true positives, and false negatives. In the second loop, for each row (prediction) it sums up the false positives. You can then combine metrics to get the sensitivity/recall, precision, F1-score, etc.
A final check is made against sklearn.metrics.classification_report()
. If you run your own data through this I recommend confirming results against the sklearn
report to ensure things work as expected.
import pandas as pd
import numpy as np
#Synthetic 3-class data and predictions
y_true = ([0]*(11+4+7)
+ [1]*(3+16+2)
+ [2]*(1+5+23))
y_pred = ([0]*11 + [1]*4 + [2]*7
+ [0]*3 + [1]*16 + [2]*2
+ [0]*1 + [1]*5 + [2]*23)
#question assumes we have the confusion matrix
confusion_matrix = np.array([[11, 3, 1],
[4, 16, 5],
[7, 2, 23]])
#Dataframe for easier viewing of labels and predictions
confusion_df = pd.DataFrame(
confusion_matrix,
columns=['is_orange', 'is_apple', 'is_pear'],
index=['predicted_orange', 'predicted_apple', 'predicted_pear']
)
metrics = {} #for recording metrics, for each class
n_classes = confusion_matrix.shape[0]
for label_idx in range(n_classes):
metrics[label_idx] = {
'tp': confusion_matrix[label_idx, label_idx],
'fn': sum( [confusion_matrix[pred_idx, label_idx]
for pred_idx in range(n_classes)
if pred_idx != label_idx] ),
'n_samples': confusion_matrix[:, label_idx].sum()
}
for pred_idx in range(n_classes):
metrics[pred_idx].update({
'fp': sum( [confusion_matrix[pred_idx, label_idx]
for label_idx in range(n_classes)
if label_idx != pred_idx] )
})
for cls, cnts in metrics.items():
metrics[cls].update({
'precision': cnts['tp'] / (cnts['tp'] + cnts['fp']),
'recall': cnts['tp'] / (cnts['tp'] + cnts['fn']),
'f1-score': cnts['tp'] / ( cnts['tp'] + 0.5*(cnts['fp'] + cnts['fn']))
})
print(metrics) #Take a look at the computed metrics per class
#Confirm macro scores against
# sklearn.metrics.classification_report()
from sklearn.metrics import classification_report
cr = classification_report(y_true, y_pred)
print(cr)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论