英文:
Get all labels / entity groups available to a model
问题
You can获取所有的标签/实体组,通过查看你使用的模型的配置文件。在这种情况下,你可以检查"Davlan/distilbert-base-multilingual-cased-ner-hrl"模型的配置文件,以获取可用的标签列表。通常,这些信息在模型的config文件中可以找到。
英文:
I have the following code to get the named entity values from a given text:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Davlan/distilbert-base-multilingual-cased-ner-hrl")
model = AutoModelForTokenClassification.from_pretrained("Davlan/distilbert-base-multilingual-cased-ner-hrl")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="max")
example = "My name is Johnathan Smith and I work at Apple"
ner_results = nlp(example)
print(ner_results)
The following is the output:
[{'end': 26,
'entity_group': 'PER',
'score': 0.9994689,
'start': 11,
'word': 'Johnathan Smith'},
{'end': 46,
'entity_group': 'ORG',
'score': 0.9983876,
'start': 41,
'word': 'Apple'}]
In the above example the labels / entitiy groups are ORG
and PER
. How to find all the labels / entitiy groups available?
Kindly advise.
答案1
得分: 2
你可以从你的模型配置的id2label属性中获取这些信息:
model.config.id2label
输出:
{0: 'O',
1: 'B-DATE',
2: 'I-DATE',
3: 'B-PER',
4: 'I-PER',
5: 'B-ORG',
6: 'I-ORG',
7: 'B-LOC',
8: 'I-LOC'}
P.S.:似乎即使模型具有将标记分类为*-DATE
的权重,它也无法执行此操作,因为它从未经过相关训练。
英文:
You can get this information from the id2label property of your model config:
model.config.id2label
Output:
{0: 'O',
1: 'B-DATE',
2: 'I-DATE',
3: 'B-PER',
4: 'I-PER',
5: 'B-ORG',
6: 'I-ORG',
7: 'B-LOC',
8: 'I-LOC'}
P.S.: It seems like, even if the model has weights for classifying tokens as *-DATE
, it is not able to do that because it was never trained on it.
答案2
得分: 1
The HuggingFace docs state this explicitly: https://huggingface.co/Davlan/distilbert-base-multilingual-cased-ner-hrl
Under Training Data, in the second table, you can find that the entities supported are "PER" for person, "ORG" for organisation, or "LOC" for location, respectively.
英文:
The HuggingFace docs state this explicitly: https://huggingface.co/Davlan/distilbert-base-multilingual-cased-ner-hrl
Under Training Data, in the second table, you can find that the entities supported are ["PER", "ORG", "LOC"]
for person, organisation or location, respectively.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论