英文:
Save SpaCy PhraseMatcher to disk
问题
You can save the SpaCy matcher to disk using Python's pickle
module. Here's the code to save and reload the matcher:
import spacy
import pickle
from spacy.matcher import PhraseMatcher
# Load SpaCy and create your matcher
nlp = spacy.load("en")
label = "SKILL"
matcher = PhraseMatcher(nlp.vocab)
# Add your phrases to the matcher
for i in list_skills:
matcher.add(label, None, nlp(i))
# Save the matcher to disk
with open("matcher.pkl", "wb") as file:
pickle.dump(matcher, file)
# To reload the matcher later:
with open("matcher.pkl", "rb") as file:
reloaded_matcher = pickle.load(file)
This way, you can reuse the matcher by loading it from disk without having to recreate it every time.
英文:
I am creating a phrasematcher with SpaCy like this:
import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.load("en")
label = "SKILL"
print("Creating the matcher...")
start = time.time()
matcher = PhraseMatcher(nlp.vocab)
for i in list_skills:
matcher.add(label, None, nlp(i))
My list_skills is very big, so the creation of matcher takes a long time, and I reuse it very often. Is there a way to save the matcher to disk, and reload it later on without having to recreate it everytime ?
答案1
得分: 3
你可以通过使用 nlp.tokenizer.pipe()
来初始节省一些时间来处理你的文本:
for doc in nlp.tokenizer.pipe(list_skills):
matcher.add(label, None, doc)
这只是进行标记化,比运行完整的 en
流程要快得多。如果你在使用 PhraseMatcher
时使用了某些 attr
设置,你可能需要使用 nlp.pipe()
,但如果是这种情况,你应该会收到错误消息。
你可以将一个 PhraseMatcher
对象保存到磁盘上。解封(unpickling)不是非常快,因为它必须重新构建一些内部数据结构,但它应该比从头创建 PhraseMatcher
要快得多。
英文:
You can save time some time initially by using nlp.tokenizer.pipe()
to process your texts:
for doc in nlp.tokenizer.pipe(list_skills):
matcher.add(label, None, doc)
This just tokenizes, which is much faster than running the full en
pipeline. If you're using certain attr
settings with PhraseMatcher
, you may need nlp.pipe()
instead, but you should get an error if this is the case.
You can pickle a PhraseMatcher
to save it to disk. Unpickling is not extremely fast because it has to reconstruct some internal data structures, but it should be a quite a bit faster than creating the PhraseMatcher
from scratch.
答案2
得分: 0
import pickle
filename = 'finalized_matcher.sav'
pickle.dump(matcher, open(filename, 'wb'))
loaded_matcher = pickle.load(open(filename, 'rb'))
英文:
import pickle
filename = 'finalized_matcher.sav'
pickle.dump(matcher, open(filename, 'wb'))
loaded_matcher = pickle.load(open(filename, 'rb'))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论