2023年2月18日 08:02:00go评论56阅读模式

英文:

Gensim Pickle Error: Enable to Load the Saved Topic Model

问题

我正在进行主题推断的工作，这将需要加载一个先前保存的模型。

然而，我遇到了一个叫做Pickle错误的问题，错误信息如下：

Traceback (most recent call last):
  File "topic_inference.py", line 35, in <module>
    model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
  File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
    result = super(LdaModel, cls).load(fname, *args, **kwargs)
  File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
    obj = unpickle(fname)
  File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
    return _pickle.load(f, encoding='latin1')  # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() 接受从0到1个位置参数，但提供了2个

我用于加载模型的代码如下：

gensim.models.LdaModel.load(model_name, mmap = 'r')

以下是我用于创建和保存模型的代码：

model = gensim.models.ldamulticore.LdaMulticore(
        corpus=comment_corpus,
        id2word=key_word_dict, ## 这现在是一个gensim.corpora.Dictionary对象，以前是.id2token属性
        chunksize=chunksize,
        alpha='symmetric',
        eta='auto',
        iterations=iterations,
        num_topics=num_topics,
        passes=epochs,
        eval_every=eval_every, 
        workers = 15,
        minimum_probability= 0.0)

model.save(output_model)

其中output_model 没有像 .model 或 .pkl 这样的扩展名。

在过去，我尝试了类似的方法，唯一的区别是，当我创建模型时，我传递了一个.id2token属性而不是完整的gensim.corpora.Dictionary对象给id2word参数，而那时该方法可以成功加载模型。我想知道是否传递一个corpora.Dictionary对象在加载输出时会有什么区别...? 那个时候，我使用的是普通的Python，但现在我正在使用Anaconda。但是，所有包的版本都是相同的。

英文:

I am working on topic inference that will require to load a previously saved model.

However, I got a pickle error that says

Traceback (most recent call last):
  File &quot;topic_inference.py&quot;, line 35, in &lt;module&gt;
    model_for_inference = gensim.models.LdaModel.load(model_name, mmap = &#39;r&#39;)
  File &quot;topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py&quot;, line 1663, in load
    result = super(LdaModel, cls).load(fname, *args, **kwargs)
  File &quot;topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py&quot;, line 486, in load
    obj = unpickle(fname)
  File &quot;topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py&quot;, line 1461, in unpickle
    return _pickle.load(f, encoding=&#39;latin1&#39;)  # needed because loading from S3 doesn&#39;t support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given

The code I use to load the model is simply

gensim.models.LdaModel.load(model_name, mmap = &#39;r&#39;)

Here is the code that I use to create and save the model

 model = gensim.models.ldamulticore.LdaMulticore(
        corpus=comment_corpus,
        id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
        chunksize=chunksize,
        alpha=&#39;symmetric&#39;,
        eta=&#39;auto&#39;,
        iterations=iterations,
        num_topics=num_topics,
        passes=epochs,
        eval_every=eval_every, 
        workers = 15,
        minimum_probability= 0.0)

model.save(output_model)

where output_model doesn't have an extension like .model or .pkl

In the past, I tried the similar approach with the exception that I passed in a .id2token attribute under the gensim.corpora.Dictionary object instead of the full gensim.corpora.Dictionary to the id2word parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.

答案1

得分: 2

关于__randomstate_ctor的另一个错误报告（位于https://github.com/numpy/numpy/issues/14210）表明问题可能与NumPy对象的序列化有关。

是否有可能在加载出现问题的配置中使用了比保存时更高版本的NumPy？您可以尝试至少暂时回退到一些旧版本的NumPy（仍然足够适用于您使用的Gensim），看看是否有所帮助。

如果您找到任何可以正常加载的情况，即使在次优配置中，您可能能够将导致问题的任何与random相关的对象设置为null并重新保存，然后在您真正需要的配置中加载效果更好的已保存版本。然后，如果在重新加载后确实需要random相关的对象，可以尝试手动重新构建它们。（我还没有深入研究这一点，但如果您找到任何允许加载但随后不确定如何手动将其设置为null/重新构建的解决方法，我可以更仔细地研究一下。）

英文:

Another report of an error about __randomstate_ctor (at <https://github.com/numpy/numpy/issues/14210>) suggests the problem may be related to numpy object pickling.

Is there a chance that the configuration where your load is failing is using a later version of numpy than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy (that's still sufficient for whatever Gensim you're using) to see if it helps?

If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Gensim Pickle错误: 无法加载保存的主题模型

问题

答案1

无法打开使用pickle创建的文件。

如何反向转换加载的 pickle XGBoost 模型的预测输出？

如何理解从pickle文件中读取的字节数据的打印结果？

pickle文件可复制吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论