所有匹配项在一行中:Spacy匹配器

huangapple go评论92阅读模式
英文:

All matches in a line : Spacy matcher

问题

I am looking for a solution to print all the matching in a line using Spacy matcher.

The example goes like this,
Here I am trying to extract experience.

doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
print([doc[matches[0][1]:matches[0][2]], doc[matches[1][1]:matches[1][2]]])

Here I am getting output [1+ years, 2 years].

英文:

I am looking for a solution to print all the matching in a line using Spacy matcher

The example goes like this,
Here I am trying to extract experience.

doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
pirnt(doc[matches[0][1]:matches[0][2]]

Here I am getting output 1+ years.

But I am looking for a solution having output
['1+ years','2 years']

答案1

得分: 2

以下是已经翻译好的内容:

你应该将第一个项目指定为 LIKE_NUM: True

pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": '?'}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]

我还将 years?|months? 缩写为 (?:year|month)s?,你甚至可以考虑使用 ^(?:year|month)s?$ 来匹配完整的标记字符串,但在这一点上这不是必要的。

代码:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": '?'}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)

doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")

matches = matcher(doc)
for _, start, end in matches:
  print(doc[start:end].text)

输出:

1+ years
2 years
英文:

You should specify the first item as 'LIKE_NUM': True:

pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]

I also contracted the years?|months? to (?:year|month)s?, you might even consider matching full token string using ^(?:year|month)s?$, but that is not necessary at this point.

Code:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)

doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")

matches = matcher(doc)
for _, start, end in matches:
  print(doc[start:end].text)

Output:

1+ years
2 years

huangapple
  • 本文由 发表于 2020年1月6日 22:33:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613898.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定