How can i get the total precision/recall/F1 score from the confusion matrix?

huangapple go评论49阅读模式
英文:

How can i get the total precision/recall/F1 score from the confusion matrix?

问题

我有一个使用sklearn计算的SVC模型的混淆矩阵,如下所示:

分类报告:
              精确率    召回率  F1得分   支持数

           0  0.7975000 0.5907407 0.6787234       540
           1  0.6316667 0.8239130 0.7150943       460

    准确率                      0.6980000      1000
   宏平均值  0.7145833 0.7073269 0.6969089      1000
加权平均值  0.7212167 0.6980000 0.6954540      1000

我该如何仅从矩阵中的值计算整体精确率/召回率和F1得分?也就是说,不导入sklearn中的函数,只从混淆矩阵本身进行计算。

英文:

I have the following confusion matrix for a SVC model compute with sklearn:

Classification report:
              precision    recall  f1-score   support

           0  0.7975000 0.5907407 0.6787234       540
           1  0.6316667 0.8239130 0.7150943       460

    accuracy                      0.6980000      1000
   macro avg  0.7145833 0.7073269 0.6969089      1000
weighted avg  0.7212167 0.6980000 0.6954540      1000

How can I compute the overall precision/recall and f1-score starting only from the value in the matrix? So without importing the function from sklearn and just computing "by hand" form the confusion matrix itself?

答案1

得分: 1

如果您已经有混淆矩阵,下面的示例展示了如何计算所需的指标。假设混淆矩阵的列/行表示标签/预测。

在第一个循环中,对于每个标签(列),它找到样本数、真正例和假反例。在第二个循环中,对于每一行(预测),它求和假正例。然后,您可以组合指标来获得灵敏度/召回率、精确度、F1 分数等。

最后,使用sklearn.metrics.classification_report()进行最终检查。如果您将自己的数据通过此函数运行,请确保与sklearn报告的结果进行确认,以确保结果符合预期。

import pandas as pd
import numpy as np

#合成的三类数据和预测结果
y_true = ([0]*(11+4+7)
          + [1]*(3+16+2)
          + [2]*(1+5+23))

y_pred = ([0]*11 + [1]*4 + [2]*7
          + [0]*3 + [1]*16 + [2]*2
          + [0]*1 + [1]*5 + [2]*23)

#假设我们有混淆矩阵
confusion_matrix = np.array([[11, 3, 1],
                             [4, 16, 5],
                             [7, 2, 23]]) 

#为了更方便地查看标签和预测结果,创建一个数据框
confusion_df = pd.DataFrame(
    confusion_matrix,
    columns=['is_orange', 'is_apple', 'is_pear'],
    index=['predicted_orange', 'predicted_apple', 'predicted_pear']
)

metrics = {} #用于记录每个类别的指标
n_classes = confusion_matrix.shape[0]
for label_idx in range(n_classes):
    metrics[label_idx] = {
        'tp': confusion_matrix[label_idx, label_idx],
        'fn': sum( [confusion_matrix[pred_idx, label_idx]
                    for pred_idx in range(n_classes)
                    if pred_idx != label_idx] ),
        'n_samples': confusion_matrix[:, label_idx].sum()
    }

for pred_idx in range(n_classes):
    metrics[pred_idx].update({
        'fp': sum( [confusion_matrix[pred_idx, label_idx]
                    for label_idx in range(n_classes)
                    if label_idx != pred_idx] )
    })

for cls, cnts in metrics.items():
    metrics[cls].update({
        'precision': cnts['tp'] / (cnts['tp'] + cnts['fp']),
        'recall': cnts['tp'] / (cnts['tp'] + cnts['fn']),
        'f1-score': cnts['tp'] / ( cnts['tp'] + 0.5*(cnts['fp'] + cnts['fn']))
    })

print(metrics)  #查看每个类别的计算指标

#确认宏平均得分
# sklearn.metrics.classification_report()
from sklearn.metrics import classification_report
cr = classification_report(y_true, y_pred)
print(cr)

以上是计算每个类别的指标并打印结果的示例代码。

英文:

If you already have the confusion matrix, the example below shows how to add things up for the required metrics. It assumes that the columns/rows of the confusion matrix represent the label/predictions.

In the first loop, for each label (column) it finds the number of samples, true positives, and false negatives. In the second loop, for each row (prediction) it sums up the false positives. You can then combine metrics to get the sensitivity/recall, precision, F1-score, etc.

A final check is made against sklearn.metrics.classification_report(). If you run your own data through this I recommend confirming results against the sklearn report to ensure things work as expected.

import pandas as pd
import numpy as np
#Synthetic 3-class data and predictions
y_true = ([0]*(11+4+7)
+ [1]*(3+16+2)
+ [2]*(1+5+23))
y_pred = ([0]*11 + [1]*4 + [2]*7
+ [0]*3 + [1]*16 + [2]*2
+ [0]*1 + [1]*5 + [2]*23)
#question assumes we have the confusion matrix
confusion_matrix = np.array([[11, 3, 1],
[4, 16, 5],
[7, 2, 23]]) 
#Dataframe for easier viewing of labels and predictions
confusion_df = pd.DataFrame(
confusion_matrix,
columns=['is_orange', 'is_apple', 'is_pear'],
index=['predicted_orange', 'predicted_apple', 'predicted_pear']
)
metrics = {} #for recording metrics, for each class
n_classes = confusion_matrix.shape[0]
for label_idx in range(n_classes):
metrics[label_idx] = {
'tp': confusion_matrix[label_idx, label_idx],
'fn': sum( [confusion_matrix[pred_idx, label_idx]
for pred_idx in range(n_classes)
if pred_idx != label_idx] ),
'n_samples': confusion_matrix[:, label_idx].sum()
}
for pred_idx in range(n_classes):
metrics[pred_idx].update({
'fp': sum( [confusion_matrix[pred_idx, label_idx]
for label_idx in range(n_classes)
if label_idx != pred_idx] )
})
for cls, cnts in metrics.items():
metrics[cls].update({
'precision': cnts['tp'] / (cnts['tp'] + cnts['fp']),
'recall': cnts['tp'] / (cnts['tp'] + cnts['fn']),
'f1-score': cnts['tp'] / ( cnts['tp'] + 0.5*(cnts['fp'] + cnts['fn']))
})
print(metrics)  #Take a look at the computed metrics per class
#Confirm macro scores against
# sklearn.metrics.classification_report()
from sklearn.metrics import classification_report
cr = classification_report(y_true, y_pred)
print(cr)

huangapple
  • 本文由 发表于 2023年7月27日 16:46:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778014.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定