获取模型可用的所有标签/实体组。

huangapple go评论70阅读模式
英文:

Get all labels / entity groups available to a model

问题

You can获取所有的标签/实体组,通过查看你使用的模型的配置文件。在这种情况下,你可以检查"Davlan/distilbert-base-multilingual-cased-ner-hrl"模型的配置文件,以获取可用的标签列表。通常,这些信息在模型的config文件中可以找到。

英文:

I have the following code to get the named entity values from a given text:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Davlan/distilbert-base-multilingual-cased-ner-hrl")
model = AutoModelForTokenClassification.from_pretrained("Davlan/distilbert-base-multilingual-cased-ner-hrl")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="max")

example = "My name is Johnathan Smith and I work at Apple"
ner_results = nlp(example)
print(ner_results)

The following is the output:

[{'end': 26,
  'entity_group': 'PER',
  'score': 0.9994689,
  'start': 11,
  'word': 'Johnathan Smith'},
 {'end': 46,
  'entity_group': 'ORG',
  'score': 0.9983876,
  'start': 41,
  'word': 'Apple'}]

In the above example the labels / entitiy groups are ORG and PER. How to find all the labels / entitiy groups available?

Kindly advise.

答案1

得分: 2

你可以从你的模型配置的id2label属性中获取这些信息:

model.config.id2label

输出:

{0: 'O',
 1: 'B-DATE',
 2: 'I-DATE',
 3: 'B-PER',
 4: 'I-PER',
 5: 'B-ORG',
 6: 'I-ORG',
 7: 'B-LOC',
 8: 'I-LOC'}

P.S.:似乎即使模型具有将标记分类为*-DATE的权重,它也无法执行此操作,因为它从未经过相关训练。

英文:

You can get this information from the id2label property of your model config:

model.config.id2label

Output:

{0: 'O',
 1: 'B-DATE',
 2: 'I-DATE',
 3: 'B-PER',
 4: 'I-PER',
 5: 'B-ORG',
 6: 'I-ORG',
 7: 'B-LOC',
 8: 'I-LOC'}

P.S.: It seems like, even if the model has weights for classifying tokens as *-DATE, it is not able to do that because it was never trained on it.

答案2

得分: 1

The HuggingFace docs state this explicitly: https://huggingface.co/Davlan/distilbert-base-multilingual-cased-ner-hrl

Under Training Data, in the second table, you can find that the entities supported are "PER" for person, "ORG" for organisation, or "LOC" for location, respectively.

英文:

The HuggingFace docs state this explicitly: https://huggingface.co/Davlan/distilbert-base-multilingual-cased-ner-hrl

Under Training Data, in the second table, you can find that the entities supported are ["PER", "ORG", "LOC"] for person, organisation or location, respectively.

huangapple
  • 本文由 发表于 2023年5月17日 21:11:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272502.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定