无法分配内存。在使用KenLM构建二进制文件时分配失败。

huangapple go评论76阅读模式
英文:

Cannot allocate memory Failed to allocate when using KenLM build_binary

问题

我有一个arpa文件,我是用以下命令创建的:

 ./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa

现在我想将这个arpa文件转换为二进制文件:

./build_binary 100m.arpa 100m.bin

但是我遇到了错误:

mmap.cc:225 中的 void util::HugeMalloc(std::size_t, bool, util::scoped_memory&) 引发 ErrnoException,因为 `!to.get()'.
无法分配内存 无法分配 106122412848 字节 字节:80
错误

我尝试添加 -S 参数:

./build_binary -S 1G 100m.arpa 100m.bin

但我得到了相同的错误。

  1. 如何将其转换为二进制文件?

  2. 为什么我会遇到这个错误?

英文:

I have a arpa file which I created by the following command:

 ./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa

Now I want to convert this arpa file to binary file:

./build_binary 100m.arpa 100m.bin

And I'm getting error:

mmap.cc:225 in void util::HugeMalloc(std::size_t, bool, util::scoped_memory&) threw ErrnoException because `!to.get()'.
Cannot allocate memory Failed to allocate 106122412848 bytes Byte: 80
ERROR

I tried to add -S parameter:

./build_binary -S 1G 100m.arpa 100m.bin

and I got the same error.

  1. How can I convert to binary file ?

  2. Why I'm getting this error ?

答案1

得分: 1

请查看 https://aclanthology.org/W16-4618 以获得一些简要解释

请尝试这个替代方法:

LM_ORDER=4
CORPUS_LM="tmp_100M"
LANG_E="txt"
LM_ARPA="100m.arpa"
LM_FILE="100m.bin"

${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
< ${CORPUS_LM}.${LANG_E} | gzip > ${LM_ARPA}

${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}

MOSES_BIN_DIR 是你编译的二进制文件存储的目录。

如果在使用 trie 和 quantization 选项时仍然遇到内存问题,您可能需要更换具有足够CPU RAM以读取语言模型并生成二进制的机器/实例。

英文:

Take a look at https://aclanthology.org/W16-4618 for some light explanation

Try this instead:

LM_ORDER=4
CORPUS_LM=&quot;tmp_100M&quot;
LANG_E=&quot;txt&quot;
LM_ARPA=&quot;100m.arpa&quot;
LM_FILE=&quot;100m.bin&quot;

${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
&lt; ${CORPUS_LM}.${LANG_E} | gzip &gt; ${LM_ARPA}

${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}

MOSES_BIN_DIR is the directory where the binaries you've compiled are stored.


If you still face the memory issue when using the trie and quantization options, you might need to change to a machine/instance where the CPU RAM is sufficient to read your language model and produce the binary.

huangapple
  • 本文由 发表于 2023年3月20日 22:35:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75791645.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定