如何使用spaCy的Scorer/Example类来计算混淆矩阵?

huangapple go评论96阅读模式
英文:

How to compute a confusion matrix using spaCy's Scorer/Example classes?

问题

您可以使用以下方式从此功能中推导出TP、FP、TN和FN:

首先,您需要了解这些评分中的一些定义:

  • TP (True Positives):模型正确预测的正样本数量。
  • FP (False Positives):模型错误地预测为正样本的负样本数量。
  • TN (True Negatives):模型正确预测的负样本数量。
  • FN (False Negatives):模型错误地预测为负样本的正样本数量。

根据您提供的评分,您可以使用以下方法计算这些值:

  • TP (True Positives):您可以使用以下公式计算特定实体类型的TP:

    TP = scores['ents_per_type']['ENTITY_TYPE']['p'] * scores['ents_per_type']['ENTITY_TYPE']['r'] * N
    

    其中,'ENTITY_TYPE' 是您关心的实体类型,'p' 是精确度,'r' 是召回率,N 是正样本的数量(在您的数据中是 example_list 的长度)。

  • FP (False Positives):您可以使用以下公式计算特定实体类型的FP:

    FP = (scores['ents_per_type']['ENTITY_TYPE']['p'] * N) - TP
    
  • TN (True Negatives):由于您的数据结构中没有提供负样本,所以TN在此上下文中没有意义。

  • FN (False Negatives):您可以使用以下公式计算特定实体类型的FN:

    FN = (scores['ents_per_type']['ENTITY_TYPE']['r'] * N) - TP
    

请确保将'ENTITY_TYPE' 替换为您关心的特定实体类型,然后可以使用这些公式来计算TP、FP和FN。请注意,TN 在此上下文中通常不适用,因为您的数据结构似乎只包含正样本,没有负样本。

英文:

I am trying to calculate the Accuracy and Specificity of a NER model using spaCy's API. The scorer.scores(example) method found here computes the Recall, Precision and F1_Score for the spans predicted by the model, but does not allow for the extrapolation of TP, FP, TN, or FN.

Below is the code I have currently written, with an example of the data structure I am using when passing my expected found entites into the model.

Code Being Used to Score the Model:

import spacy
from spacy.scorer import Scorer
from spacy.training.example import Example

scorer = Scorer()
example = []
for obs in example_list:
    print('Input for a prediction:', obs['full_text'])
    pred = custom_nlp(obs['full_text'])  ## custom_nlp is the custome model I am using to generate docs
    print('Predicted based off of input:', pred, '// Entities being reviewed:', obs['entities'])
    temp = Example.from_dict(pred, {'entities': obs['entities']})
    example.append(temp)
scores = scorer.score_spans(example, "ents")

The data structure I am currently using to load the Example class (list of dictionaries):
example_list[0]
{'full_text': 'I would like to remove my kid Florence from the will. How do I do that?',
'entities': [(30, 38, 'PERSON')]}

The result that I am returning from running print(scores) is as expected; a dictionary of tokenization's precision, recall, f1_score, as well as the entity recognition's precision, recall and f1_score.

{'ents_p': 0.8731019522776573,
 'ents_r': 0.9179019384264538,
 'ents_f': 0.8949416342412452,
 'ents_per_type': {'PERSON': {'p': 0.9039145907473309,
   'r': 0.9694656488549618,
   'f': 0.9355432780847145},
  'GPE': {'p': 0.7973856209150327,
   'r': 0.9384615384615385,
   'f': 0.8621908127208481},
  'STREET_ADDRESS': {'p': 0.8308457711442786,
   'r': 0.893048128342246,
   'f': 0.8608247422680412},
  'ORGANIZATION': {'p': 0.9565217391304348,
   'r': 0.7415730337078652,
   'f': 0.8354430379746837},
  'CREDIT_CARD': {'p': 0.9411764705882353, 'r': 1.0, 'f': 0.9696969696969697},
  'AGE': {'p': 1.0, 'r': 1.0, 'f': 1.0},
  'US_SSN': {'p': 1.0, 'r': 1.0, 'f': 1.0},
  'DOMAIN_NAME': {'p': 0.4, 'r': 1.0, 'f': 0.5714285714285715},
  'TITLE': {'p': 0.8709677419354839, 'r': 0.84375, 'f': 0.8571428571428571},
  'PHONE_NUMBER': {'p': 0.8275862068965517,
   'r': 0.8275862068965517,
   'f': 0.8275862068965517},
  'EMAIL_ADDRESS': {'p': 1.0, 'r': 1.0, 'f': 1.0},
  'DATE_TIME': {'p': 1.0, 'r': 1.0, 'f': 1.0},
  'NRP': {'p': 1.0, 'r': 1.0, 'f': 1.0},
  'IBAN_CODE': {'p': 1.0, 'r': 1.0, 'f': 1.0},
  'IP_ADDRESS': {'p': 0.75, 'r': 0.75, 'f': 0.75},
  'ZIP_CODE': {'p': 0.8333333333333334,
   'r': 0.7142857142857143,
   'f': 0.7692307692307692},
  'US_DRIVER_LICENSE': {'p': 1.0, 'r': 1.0, 'f': 1.0}}}

How can I extrapolate the TP, FP, TN and FN from this function using some form of an attribute?

答案1

得分: 1

https://github.com/explosion/spaCy/discussions/12682#discussioncomment-6036758 复制的内容:

目前提供的评分器不支持此功能(您没有忽略任何内置选项),但您可以在配置中的评分器设置中用自己的自定义注册评分方法替换默认评分器。

当您定义自定义评分器时,基本情况如下(此示例仅修改返回的键的名称):

spaCy/spacy/tests/test_language.py 第188-199行

def custom_textcat_score(examples, **kwargs): 
    scores = Scorer.score_cats( 
        examples, 
        "cats", 
        multi_label=False, 
        **kwargs, 
    ) 
    return {f"custom_{k}": v for k, v in scores.items()} 
 
@spacy.registry.scorers("test_custom_textcat_scorer") 
def make_custom_textcat_scorer(): 
    return custom_textcat_score 

通常,您会为 spacy trainspacy evaluatespacy package 等提供自定义评分器,例如 -c code.py

当前的命名实体识别 (NER) 评分器位于以下位置:

spaCy/spacy/scorer.py 第750-792行

def get_ner_prf(examples: Iterable[Example], **kwargs) -> Dict[str, Any]: 
    """Compute micro-PRF and per-entity PRF scores for a sequence of examples.""" 
    score_per_type = defaultdict(PRFScore) 
    for eg in examples: 
        if not eg.y.has_annotation("ENT_IOB"): 
            continue 
        golds = {(e.label_, e.start, e.end) for e in eg.y.ents} 
        align_x2y = eg.alignment.x2y 
        for pred_ent in eg.x.ents: 
            if pred_ent.label_ not in score_per_type: 
                score_per_type[pred_ent.label_] = PRFScore() 
            indices = align_x2y[pred_ent.start : pred_ent.end] 
            if len(indices): 
                g_span = eg.y[indices[0] : indices[-1] + 1] 
                # Check we aren't missing annotation on this span. If so, 
                # our prediction is neither right nor wrong, we just 
                # ignore it. 
                if all(token.ent_iob != 0 for token in g_span): 
                    key = (pred_ent.label_, indices[0], indices[-1] + 1) 
                    if key in golds: 
                        score_per_type[pred_ent.label_].tp += 1 
                        golds.remove(key) 
                    else: 
                        score_per_type[pred_ent.label_].fp += 1 
        for label, start, end in golds: 
            score_per_type[label].fn += 1 
    totals = PRFScore() 
    for prf in score_per_type.values(): 
        totals += prf 
    if len(totals) > 0: 
        return { 
            "ents_p": totals.precision, 
            "ents_r": totals.recall, 
            "ents_f": totals.fscore, 
            "ents_per_type": {k: v.to_dict() for k, v in score_per_type.items()}, 
        } 
    else: 
        return { 
            "ents_p": None, 
            "ents_r": None, 
            "ents_f": None, 
            "ents_per_type": None, 
        } 

我向 spaCy 论坛的管理员提出了以下问题:
参考链接:https://github.com/explosion/spaCy/discussions/12682

英文:

Copied from https://github.com/explosion/spaCy/discussions/12682#discussioncomment-6036758:

> This isn't currently supported by the provided scorers (you haven't overlooked any built-in options), but you can replace the default scorer with your own custom registered scoring method in the scorer setting in the config.
>
> Here are what the basics look like when you define a custom scorer (this example just modifies the names of the returned keys):
>
> spaCy/spacy/tests/test_language.py Lines 188-199
>
>
> def custom_textcat_score(examples, **kwargs):
> scores = Scorer.score_cats(
> examples,
> "cats",
> multi_label=False,
> **kwargs,
> )
> return {f"custom_{k}": v for k, v in scores.items()}
>
> @spacy.registry.scorers("test_custom_textcat_scorer")
> def make_custom_textcat_scorer():
> return custom_textcat_score
>

>
> You'd usually provide your custom scorer with -c code.py for spacy train, spacy evaluate, spacy package, etc.
>
> The current NER scorer is here:
>
> spaCy/spacy/scorer.py Lines 750-792
>
> ```
> def get_ner_prf(examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
> """Compute micro-PRF and per-entity PRF scores for a sequence of examples."""
> score_per_type = defaultdict(PRFScore)
> for eg in examples:
> if not eg.y.has_annotation("ENT_IOB"):
> continue
> golds = {(e.label_, e.start, e.end) for e in eg.y.ents}
> align_x2y = eg.alignment.x2y
> for pred_ent in eg.x.ents:
> if pred_ent.label_ not in score_per_type:
> score_per_type[pred_ent.label_] = PRFScore()
> indices = align_x2y[pred_ent.start : pred_ent.end]
> if len(indices):
> g_span = eg.y[indices[0] : indices[-1] + 1]
> # Check we aren't missing annotation on this span. If so,
> # our prediction is neither right nor wrong, we just
> # ignore it.
> if all(token.ent_iob != 0 for token in g_span):
> key = (pred_ent.label_, indices[0], indices[-1] + 1)
> if key in golds:
> score_per_type[pred_ent.label_].tp += 1
> golds.remove(key)
> else:
> score_per_type[pred_ent.label_].fp += 1
> for label, start, end in golds:
> score_per_type[label].fn += 1
> totals = PRFScore()
> for prf in score_per_type.values():
> totals += prf
> if len(totals) > 0:
> return {
> "ents_p": totals.precision,
> "ents_r": totals.recall,
> "ents_f": totals.fscore,
> "ents_per_type": {k: v.to_dict() for k, v in score_per_type.items()},
> }
> else:
> return {
> "ents_p": None,
> "ents_r": None,
> "ents_f": None,
> "ents_per_type": None,
> }

I asked the moderators in spacy with this:
Reference:https://github.com/explosion/spaCy/discussions/12682

huangapple
  • 本文由 发表于 2023年5月29日 04:45:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76353531.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定