2023年3月15日 19:22:10go评论68阅读模式

英文:

Why do we need to write a function to "Compute Metrics" with Huggingface Question Answering Trainer when evaluating SQuAD?

问题

Currently, I'm trying to build a Extractive QA pipeline, following the Huggingface Course on the matter. There, they show how to create a compute_metrics() function to evaluate the model after training. However, I was wondering if there's a way to obtain those metrics on training, and pass the compute_metrics() function directly to the trainer. They are training using only the training loss, and I would like to have the evaluation f1 score on training.

But, as I see it, it might be a little bit tricky, because they need the original spans to calculate the squad metrics, but you don't get those original spans passed on your tokenized training dataset.

predicted_answer = {'id': '56be4db0acb8001400a502ec', 'prediction_text': 'Denver Broncos'}
theoretical_answer = {'id': '56be4db0acb8001400a502ec', 'answers': {'text': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos'], 'answer_start': [177, 177, 177]}}

metric.compute(predictions=predicted_answers, references=theoretical_answers)

That's why they make the whole compute_metrics() function, taking a few extra parameters than the prediction outputted in the evaluation loop, as they need to rebuild those spans.

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the `Trainer` object?

英文:

predicted_answer = {&#39;id&#39;: &#39;56be4db0acb8001400a502ec&#39;, &#39;prediction_text&#39;: &#39;Denver Broncos&#39;}
theoretical_answer = {&#39;id&#39;: &#39;56be4db0acb8001400a502ec&#39;, &#39;answers&#39;: {&#39;text&#39;: [&#39;Denver Broncos&#39;, &#39;Denver Broncos&#39;, &#39;Denver Broncos&#39;], &#39;answer_start&#39;: [177, 177, 177]}}

metric.compute(predictions=predicted_answers, references=theoretical_answers)

That's why they make the whole compute_metrics() function, taking a few extra parameters than the prediction outputted in the evaluation loop, as they need to rebuild those spans.

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the `Trainer` object?

答案1

得分: 5

The compute_metrics function can be passed into the Trainer so that it validating on the metrics you need, e.g.

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

I'm not sure if it works out of the box with the code to process the train_dataset and validation_dataset in the course code link to Huggingface course

But this ones shows how the Trainer + compute_metrics work link to Huggingface course

英文:

The compute_metrics function can be passed into the Trainer so that it validating on the metrics you need, e.g.

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

I'm not sure if it works out of the box with the code to process the train_dataset and validation_dataset in the course code https://huggingface.co/course/chapter7

But this ones shows how the Trainer + compute_metrics work https://huggingface.co/course/chapter3/3

Before proceeding to read the rest of the answer, here's some disclaimers:

Try to get through the full course Chapter 1-9 and the compute_metrics and usage of evaluate.metric would make a sense why you can't plug in evaluate.metric directly to the Trainer object. https://huggingface.co/course/
Alternatively, walking through this book would help too: https://www.oreilly.com/library/view/natural-language-processing/9781098136789/

And now, here goes...

Firstly, lets take a look at what the `evaluate` library is/does

From https://huggingface.co/spaces/evaluate-metric/squad

from evaluate import load

squad_metric = load(&quot;squad&quot;)

predictions = [{&#39;prediction_text&#39;: &#39;1976&#39;, &#39;id&#39;: &#39;56e10a3be3433e1400422b22&#39;}]
references = [{&#39;answers&#39;: {&#39;answer_start&#39;: [97], &#39;text&#39;: [&#39;1976&#39;]}, &#39;id&#39;: &#39;56e10a3be3433e1400422b22&#39;}]

results = squad_metric.compute(predictions=predictions, references=references)

print(results)

[out]:

{&#39;exact_match&#39;: 100.0, &#39;f1&#39;: 100.0}

Next, we take a look at what the `compute_metrics` argument in the `Trainer` expects

From Line 600 https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa.py

    metric = evaluate.load(&quot;squad_v2&quot; if data_args.version_2_with_negative else &quot;squad&quot;)

    def compute_metrics(p: EvalPrediction):
        return metric.compute(predictions=p.predictions, references=p.label_ids)

    # Initialize our Trainer
    trainer = QuestionAnsweringTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset if training_args.do_train else None,
        eval_dataset=eval_dataset if training_args.do_eval else None,
        eval_examples=eval_examples if training_args.do_eval else None,
        tokenizer=tokenizer,
        data_collator=data_collator,
        post_process_function=post_processing_function,
        compute_metrics=compute_metrics,
    )

The compute_metrics argument in the QuestionAnsweringTrainer is expecting a function that:

[in]: Takes in an EvalPrediction object as input
[out]: Returns a dict of keys-value pairs where the key is the name of the output metric in string type and the value is expected to a floating point

Un momento! (Wait a minute!) What are these `QuestionAnsweringTrainer` and `EvalPrediction` objects?

Q: Why are you not using the normal `Trainer` object?

A: The QuestionAnsweringTrainer is a specific sub-class of the Trainer object that is used for the QA task. If you're going to train a model to evaluate on the SQUAD dataset, then the QuestionAnsweringTrainer is the most appropriate Trainer object to use.

[Suggestion]: Most probably HuggingFace devs and dev-advocate should add some notes on the object in QuestionAnsweringTrainer https://huggingface.co/course/chapter7/7?fw=pt

Q: What is this `EvalPrediction` object then?

A: Officially, I guess it's this: https://discuss.huggingface.co/t/what-does-evalprediction-predictions-contain-exactly/1691/5

If we look at the doc: https://huggingface.co/docs/transformers/internal/trainer_utils and the code, it looks like the object is a custom container class that holds the (i) predictions, (ii) label_ids and (iii) inputs np.ndarray. These are what the model's inference function need to return in order for the compute_metrics to work as expected.

class EvalPrediction:
    &quot;&quot;&quot;
    Evaluation output (always contains labels), to be used to compute metrics.
    Parameters:
        predictions (`np.ndarray`): Predictions of the model.
        label_ids (`np.ndarray`): Targets to be matched.
        inputs (`np.ndarray`, *optional*)
    &quot;&quot;&quot;

    def __init__(
        self,
        predictions: Union[np.ndarray, Tuple[np.ndarray]],
        label_ids: Union[np.ndarray, Tuple[np.ndarray]],
        inputs: Optional[Union[np.ndarray, Tuple[np.ndarray]]] = None,
    ):
        self.predictions = predictions
        self.label_ids = label_ids
        self.inputs = inputs

    def __iter__(self):
        if self.inputs is not None:
            return iter((self.predictions, self.label_ids, self.inputs))
        else:
            return iter((self.predictions, self.label_ids))

    def __getitem__(self, idx):
        if idx == 0:
            return self.predictions
        elif idx == 1:
            return self.label_ids
        elif idx == 2:
            return self.inputs

Hey, you still haven't answer the question of how I can use the `evaluate.metrics('squad')` directly to the the `compute_metrics` args!

Yes, for now, you can't directly use it but it's a simple wrapper.

Step 1. Make sure the model you want to use outputs the required EvalPrediction object that contains, predictions and label_ids

If you're using most the models supported for QA in Huggingface's transformers library, they should already output the expected EvalPrediction.

Otherwise, take a look at models supported by https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering

Step 2: Since the model inference outputs `EvalPrediction` but the compute_metrics expects a dictionary outputs, _you have to wrap the `evaluate.metrics` function

E.g.

    metric = evaluate.load(&quot;squad_v2&quot; if data_args.version_2_with_negative else &quot;squad&quot;)

    def compute_metrics(p: EvalPrediction):
        return metric.compute(predictions=p.predictions, references=p.label_ids)

Q: Do we really always need to write that wrapper function?

A: For now, yes, it is by design not directly integrated with the outputs of the evaluate.metrics to give the different metrics' developers freedom to define how they want their inputs/outputs to look like.

But there might be hope to make compute_metrics more integrated with evaluate.metric if someone picks this feature request up! https://discuss.huggingface.co/t/feature-request-adding-default-compute-metrics-to-popular-evaluate-metrics/33909/3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Why do we need to write a function to "Compute Metrics" with Huggingface Question Answering Trainer when evaluating SQuAD?

问题

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the `Trainer` object?

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the `Trainer` object?

答案1

Firstly, lets take a look at what the `evaluate` library is/does

Next, we take a look at what the `compute_metrics` argument in the `Trainer` expects

Un momento! (Wait a minute!) What are these `QuestionAnsweringTrainer` and `EvalPrediction` objects?

Q: Why are you not using the normal `Trainer` object?

Q: What is this `EvalPrediction` object then?

Hey, you still haven't answer the question of how I can use the `evaluate.metrics('squad')` directly to the the `compute_metrics` args!

Step 1. Make sure the model you want to use outputs the required EvalPrediction object that contains, predictions and label_ids

Step 2: Since the model inference outputs `EvalPrediction` but the compute_metrics expects a dictionary outputs, _you have to wrap the `evaluate.metrics` function

Q: Do we really always need to write that wrapper function?

JAX 0.2.17和JAX 0.4.1之间的内存需求巨大差异

Python Popen将命令提示符输出写入日志文件

如何查找哪个包依赖于 “futures” 在 requirements.txt 中

Holoviews Hovertool显示额外的行？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论

问题

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the Trainer object?

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the Trainer object?

答案1

Firstly, lets take a look at what the evaluate library is/does

Next, we take a look at what the compute_metrics argument in the Trainer expects

Un momento! (Wait a minute!) What are these QuestionAnsweringTrainer and EvalPrediction objects?

Q: Why are you not using the normal Trainer object?

Q: What is this EvalPrediction object then?

Hey, you still haven't answer the question of how I can use the evaluate.metrics(&#39;squad&#39;) directly to the the compute_metrics args!

Step 1. Make sure the model you want to use outputs the required EvalPrediction object that contains, predictions and label_ids

Step 2: Since the model inference outputs EvalPrediction but the compute_metrics expects a dictionary outputs, _you have to wrap the evaluate.metrics function

Q: Do we really always need to write that wrapper function?

发表评论

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the `Trainer` object?

Q: How do I make the squad metric outputs F1 and accuracy scores from evaluate? How do I use the squad metric with the `Trainer` object?

Firstly, lets take a look at what the `evaluate` library is/does

Next, we take a look at what the `compute_metrics` argument in the `Trainer` expects

Un momento! (Wait a minute!) What are these `QuestionAnsweringTrainer` and `EvalPrediction` objects?

Q: Why are you not using the normal `Trainer` object?

Q: What is this `EvalPrediction` object then?

Hey, you still haven't answer the question of how I can use the `evaluate.metrics('squad')` directly to the the `compute_metrics` args!

Step 2: Since the model inference outputs `EvalPrediction` but the compute_metrics expects a dictionary outputs, _you have to wrap the `evaluate.metrics` function