2023年3月20日 22:35:56go评论98阅读模式

英文:

Cannot allocate memory Failed to allocate when using KenLM build_binary

问题

我有一个arpa文件，我是用以下命令创建的：

 ./lmplz -o 4 -S 1G &lt;tmp_100M.txt &gt;100m.arpa

现在我想将这个arpa文件转换为二进制文件：

./build_binary 100m.arpa 100m.bin

但是我遇到了错误：

mmap.cc:225 中的 void util::HugeMalloc(std::size_t, bool, util::scoped_memory&amp;) 引发 ErrnoException，因为 `!to.get()&#39;.
无法分配内存 无法分配 106122412848 字节 字节：80
错误

我尝试添加 -S 参数：

./build_binary -S 1G 100m.arpa 100m.bin

但我得到了相同的错误。

如何将其转换为二进制文件？
为什么我会遇到这个错误？

英文:

I have a arpa file which I created by the following command:

 ./lmplz -o 4 -S 1G &lt;tmp_100M.txt &gt;100m.arpa

Now I want to convert this arpa file to binary file:

./build_binary 100m.arpa 100m.bin

And I'm getting error:

mmap.cc:225 in void util::HugeMalloc(std::size_t, bool, util::scoped_memory&amp;) threw ErrnoException because `!to.get()&#39;.
Cannot allocate memory Failed to allocate 106122412848 bytes Byte: 80
ERROR

I tried to add -S parameter:

./build_binary -S 1G 100m.arpa 100m.bin

and I got the same error.

How can I convert to binary file ?
Why I'm getting this error ?

答案1

得分: 1

请查看 https://aclanthology.org/W16-4618 以获得一些简要解释

请尝试这个替代方法：

LM_ORDER=4
CORPUS_LM="tmp_100M"
LANG_E="txt"
LM_ARPA="100m.arpa"
LM_FILE="100m.bin"
${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
< ${CORPUS_LM}.${LANG_E} | gzip > ${LM_ARPA}
${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}

MOSES_BIN_DIR 是你编译的二进制文件存储的目录。

如果在使用 trie 和 quantization 选项时仍然遇到内存问题，您可能需要更换具有足够CPU RAM以读取语言模型并生成二进制的机器/实例。

英文:

Take a look at https://aclanthology.org/W16-4618 for some light explanation

Try this instead:

LM_ORDER=4
CORPUS_LM=&quot;tmp_100M&quot;
LANG_E=&quot;txt&quot;
LM_ARPA=&quot;100m.arpa&quot;
LM_FILE=&quot;100m.bin&quot;
${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
&lt; ${CORPUS_LM}.${LANG_E} | gzip &gt; ${LM_ARPA}
${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}

MOSES_BIN_DIR is the directory where the binaries you've compiled are stored.

If you still face the memory issue when using the trie and quantization options, you might need to change to a machine/instance where the CPU RAM is sufficient to read your language model and produce the binary.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法分配内存。在使用KenLM构建二进制文件时分配失败。

问题

答案1

While循环在使用子进程时不中断。

在C++中是否可以声明具有固定位宽的类型，而不使用结构体？

顶点着色器的输入是如何工作的？

在GNUPlot C++中绘制带有间隙的轮廓图？ (C++ 14, VS 22)

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。