英文:
All matches in a line : Spacy matcher
问题
I am looking for a solution to print all the matching in a line using Spacy matcher.
The example goes like this,
Here I am trying to extract experience.
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
print([doc[matches[0][1]:matches[0][2]], doc[matches[1][1]:matches[1][2]]])
Here I am getting output [1+ years, 2 years]
.
英文:
I am looking for a solution to print all the matching in a line using Spacy matcher
The example goes like this,
Here I am trying to extract experience.
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
pirnt(doc[matches[0][1]:matches[0][2]]
Here I am getting output 1+ years
.
But I am looking for a solution having output
['1+ years','2 years']
答案1
得分: 2
以下是已经翻译好的内容:
你应该将第一个项目指定为 LIKE_NUM: True
:
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": '?'}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
我还将 years?|months?
缩写为 (?:year|month)s?
,你甚至可以考虑使用 ^(?:year|month)s?$
来匹配完整的标记字符串,但在这一点上这不是必要的。
代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": '?'}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
matches = matcher(doc)
for _, start, end in matches:
print(doc[start:end].text)
输出:
1+ years
2 years
英文:
You should specify the first item as 'LIKE_NUM': True
:
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
I also contracted the years?|months?
to (?:year|month)s?
, you might even consider matching full token string using ^(?:year|month)s?$
, but that is not necessary at this point.
Code:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
matches = matcher(doc)
for _, start, end in matches:
print(doc[start:end].text)
Output:
1+ years
2 years
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论