英文:
Gensim Pickle Error: Enable to Load the Saved Topic Model
问题
我正在进行主题推断的工作,这将需要加载一个先前保存的模型。
然而,我遇到了一个叫做Pickle错误的问题,错误信息如下:
Traceback (most recent call last):
File "topic_inference.py", line 35, in <module>
model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
result = super(LdaModel, cls).load(fname, *args, **kwargs)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
obj = unpickle(fname)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() 接受从0到1个位置参数,但提供了2个
我用于加载模型的代码如下:
gensim.models.LdaModel.load(model_name, mmap = 'r')
以下是我用于创建和保存模型的代码:
model = gensim.models.ldamulticore.LdaMulticore(
corpus=comment_corpus,
id2word=key_word_dict, ## 这现在是一个gensim.corpora.Dictionary对象,以前是.id2token属性
chunksize=chunksize,
alpha='symmetric',
eta='auto',
iterations=iterations,
num_topics=num_topics,
passes=epochs,
eval_every=eval_every,
workers = 15,
minimum_probability= 0.0)
model.save(output_model)
其中output_model
没有像 .model
或 .pkl
这样的扩展名。
在过去,我尝试了类似的方法,唯一的区别是,当我创建模型时,我传递了一个.id2token
属性而不是完整的gensim.corpora.Dictionary
对象给id2word
参数,而那时该方法可以成功加载模型。我想知道是否传递一个corpora.Dictionary
对象在加载输出时会有什么区别...? 那个时候,我使用的是普通的Python,但现在我正在使用Anaconda。但是,所有包的版本都是相同的。
英文:
I am working on topic inference that will require to load a previously saved model.
However, I got a pickle error that says
Traceback (most recent call last):
File "topic_inference.py", line 35, in <module>
model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
result = super(LdaModel, cls).load(fname, *args, **kwargs)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
obj = unpickle(fname)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given
The code I use to load the model is simply
gensim.models.LdaModel.load(model_name, mmap = 'r')
Here is the code that I use to create and save the model
model = gensim.models.ldamulticore.LdaMulticore(
corpus=comment_corpus,
id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
chunksize=chunksize,
alpha='symmetric',
eta='auto',
iterations=iterations,
num_topics=num_topics,
passes=epochs,
eval_every=eval_every,
workers = 15,
minimum_probability= 0.0)
model.save(output_model)
where output_model
doesn't have an extension like .model
or .pkl
In the past, I tried the similar approach with the exception that I passed in a .id2token
attribute under the gensim.corpora.Dictionary
object instead of the full gensim.corpora.Dictionary
to the id2word
parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary
will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.
答案1
得分: 2
关于__randomstate_ctor
的另一个错误报告(位于https://github.com/numpy/numpy/issues/14210)表明问题可能与NumPy对象的序列化有关。
是否有可能在加载出现问题的配置中使用了比保存时更高版本的NumPy?您可以尝试至少暂时回退到一些旧版本的NumPy(仍然足够适用于您使用的Gensim),看看是否有所帮助。
如果您找到任何可以正常加载的情况,即使在次优配置中,您可能能够将导致问题的任何与random
相关的对象设置为null并重新保存,然后在您真正需要的配置中加载效果更好的已保存版本。然后,如果在重新加载后确实需要random
相关的对象,可以尝试手动重新构建它们。(我还没有深入研究这一点,但如果您找到任何允许加载但随后不确定如何手动将其设置为null/重新构建的解决方法,我可以更仔细地研究一下。)
英文:
Another report of an error about __randomstate_ctor
(at <https://github.com/numpy/numpy/issues/14210>) suggests the problem may be related to numpy object pickling.
Is there a chance that the configuration where your load is failing is using a later version of numpy
than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy
(that's still sufficient for whatever Gensim you're using) to see if it helps?
If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random
-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random
-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论