2020年1月6日 17:00:47go评论108阅读模式

英文:

Save SpaCy PhraseMatcher to disk

问题

You can save the SpaCy matcher to disk using Python's pickle module. Here's the code to save and reload the matcher:

import spacy
import pickle
from spacy.matcher import PhraseMatcher

# Load SpaCy and create your matcher
nlp = spacy.load("en")
label = "SKILL"
matcher = PhraseMatcher(nlp.vocab)

# Add your phrases to the matcher
for i in list_skills:
    matcher.add(label, None, nlp(i))

# Save the matcher to disk
with open("matcher.pkl", "wb") as file:
    pickle.dump(matcher, file)

# To reload the matcher later:
with open("matcher.pkl", "rb") as file:
    reloaded_matcher = pickle.load(file)

This way, you can reuse the matcher by loading it from disk without having to recreate it every time.

英文:

I am creating a phrasematcher with SpaCy like this:

import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.load(&quot;en&quot;)
label = &quot;SKILL&quot;
print(&quot;Creating the matcher...&quot;)

start = time.time()
matcher = PhraseMatcher(nlp.vocab)
for i in list_skills:
    matcher.add(label, None, nlp(i))

My list_skills is very big, so the creation of matcher takes a long time, and I reuse it very often. Is there a way to save the matcher to disk, and reload it later on without having to recreate it everytime ?

答案1

得分: 3

你可以通过使用 nlp.tokenizer.pipe() 来初始节省一些时间来处理你的文本：

for doc in nlp.tokenizer.pipe(list_skills):
    matcher.add(label, None, doc)

这只是进行标记化，比运行完整的 en 流程要快得多。如果你在使用 PhraseMatcher 时使用了某些 attr 设置，你可能需要使用 nlp.pipe()，但如果是这种情况，你应该会收到错误消息。

你可以将一个 PhraseMatcher 对象保存到磁盘上。解封（unpickling）不是非常快，因为它必须重新构建一些内部数据结构，但它应该比从头创建 PhraseMatcher 要快得多。

英文:

You can save time some time initially by using nlp.tokenizer.pipe() to process your texts:

for doc in nlp.tokenizer.pipe(list_skills):
    matcher.add(label, None, doc)

This just tokenizes, which is much faster than running the full en pipeline. If you're using certain attr settings with PhraseMatcher, you may need nlp.pipe() instead, but you should get an error if this is the case.

You can pickle a PhraseMatcher to save it to disk. Unpickling is not extremely fast because it has to reconstruct some internal data structures, but it should be a quite a bit faster than creating the PhraseMatcher from scratch.

答案2

得分: 0

import pickle

filename = 'finalized_matcher.sav'

pickle.dump(matcher, open(filename, 'wb'))

loaded_matcher = pickle.load(open(filename, 'rb'))

英文:

import pickle

filename = &#39;finalized_matcher.sav&#39;

pickle.dump(matcher, open(filename, &#39;wb&#39;))
 

loaded_matcher = pickle.load(open(filename, &#39;rb&#39;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将 SpaCy 的 PhraseMatcher 保存到磁盘。

问题

答案1

答案2

自然语言处理句子分割相较于Python算法的好处是什么？

Highlight python-docx with regex and spacy.

如何在 SpaCy 的 config.cfg 文件中注册自定义组件？

Python自然语言处理（NLP）处理中的if语句不在停用词列表中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论