2020年1月6日 22:33:53go评论118阅读模式

英文:

All matches in a line : Spacy matcher

问题

I am looking for a solution to print all the matching in a line using Spacy matcher.

The example goes like this,
Here I am trying to extract experience.

doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
print([doc[matches[0][1]:matches[0][2]], doc[matches[1][1]:matches[1][2]]])

Here I am getting output [1+ years, 2 years].

英文:

I am looking for a solution to print all the matching in a line using Spacy matcher

The example goes like this,
Here I am trying to extract experience.

doc = nlp(&quot;1+ years of experience in XX, 2 years of experiance in YY&quot;)
pattern = [{&#39;POS&#39;: &#39;NUM&#39;}, {&#39;ORTH&#39;: &#39;+&#39;, &quot;OP&quot;: &quot;?&quot;}, {&quot;LOWER&quot;: {&quot;REGEX&quot;: &quot;years?|months?&quot;}}]
matcher = Matcher(nlp.vocab)
matcher.add(&quot;Skills&quot;, None, pattern)
matches = matcher(doc)
pirnt(doc[matches[0][1]:matches[0][2]]

Here I am getting output 1+ years.

But I am looking for a solution having output
['1+ years','2 years']

答案1

得分: 2

以下是已经翻译好的内容：

你应该将第一个项目指定为 LIKE_NUM: True：

pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": '?'}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]

我还将 years?|months? 缩写为 (?:year|month)s?，你甚至可以考虑使用 ^(?:year|month)s?$ 来匹配完整的标记字符串，但在这一点上这不是必要的。

代码：

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": '?'}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
matches = matcher(doc)
for _, start, end in matches:
  print(doc[start:end].text)

输出：

1+ years
2 years

英文:

You should specify the first item as 'LIKE_NUM': True:

pattern = [{&#39;LIKE_NUM&#39;: True}, {&#39;ORTH&#39;: &#39;+&#39;, &quot;OP&quot;: &quot;?&quot;}, {&quot;LOWER&quot;: {&quot;REGEX&quot;: &quot;(?:year|month)s?&quot;}}]

I also contracted the years?|months? to (?:year|month)s?, you might even consider matching full token string using ^(?:year|month)s?$, but that is not necessary at this point.

Code:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load(&quot;en_core_web_sm&quot;)
matcher = Matcher(nlp.vocab)
pattern = [{&#39;LIKE_NUM&#39;: True}, {&#39;ORTH&#39;: &#39;+&#39;, &quot;OP&quot;: &quot;?&quot;}, {&quot;LOWER&quot;: {&quot;REGEX&quot;: &quot;(?:year|month)s?&quot;}}]
matcher.add(&quot;Skills&quot;, None, pattern)
doc = nlp(&quot;1+ years of experience in XX, 2 years of experiance in YY&quot;)
matches = matcher(doc)
for _, start, end in matches:
  print(doc[start:end].text)

Output:

1+ years
2 years

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

所有匹配项在一行中：Spacy匹配器

问题

答案1

通过与另一个DataFrame进行比较来筛选DataFrame

我有这个函数，它告诉我音频的长度，但给出了一个错误的数字。

main.py 和 init.py 文件的无效行为

Spark UI报告的执行计划时间与实际时间相差3倍。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。