英文:
Cannot allocate memory Failed to allocate when using KenLM build_binary
问题
我有一个arpa
文件,我是用以下命令创建的:
./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa
现在我想将这个arpa
文件转换为二进制文件:
./build_binary 100m.arpa 100m.bin
但是我遇到了错误:
mmap.cc:225 中的 void util::HugeMalloc(std::size_t, bool, util::scoped_memory&) 引发 ErrnoException,因为 `!to.get()'.
无法分配内存 无法分配 106122412848 字节 字节:80
错误
我尝试添加 -S
参数:
./build_binary -S 1G 100m.arpa 100m.bin
但我得到了相同的错误。
-
如何将其转换为二进制文件?
-
为什么我会遇到这个错误?
英文:
I have a arpa
file which I created by the following command:
./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa
Now I want to convert this arpa
file to binary file:
./build_binary 100m.arpa 100m.bin
And I'm getting error:
mmap.cc:225 in void util::HugeMalloc(std::size_t, bool, util::scoped_memory&) threw ErrnoException because `!to.get()'.
Cannot allocate memory Failed to allocate 106122412848 bytes Byte: 80
ERROR
I tried to add -S
parameter:
./build_binary -S 1G 100m.arpa 100m.bin
and I got the same error.
-
How can I convert to binary file ?
-
Why I'm getting this error ?
答案1
得分: 1
请查看 https://aclanthology.org/W16-4618 以获得一些简要解释
请尝试这个替代方法:
LM_ORDER=4
CORPUS_LM="tmp_100M"
LANG_E="txt"
LM_ARPA="100m.arpa"
LM_FILE="100m.bin"
${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
< ${CORPUS_LM}.${LANG_E} | gzip > ${LM_ARPA}
${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}
MOSES_BIN_DIR
是你编译的二进制文件存储的目录。
如果在使用 trie 和 quantization 选项时仍然遇到内存问题,您可能需要更换具有足够CPU RAM以读取语言模型并生成二进制的机器/实例。
英文:
Take a look at https://aclanthology.org/W16-4618 for some light explanation
Try this instead:
LM_ORDER=4
CORPUS_LM="tmp_100M"
LANG_E="txt"
LM_ARPA="100m.arpa"
LM_FILE="100m.bin"
${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
< ${CORPUS_LM}.${LANG_E} | gzip > ${LM_ARPA}
${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}
MOSES_BIN_DIR
is the directory where the binaries you've compiled are stored.
If you still face the memory issue when using the trie and quantization options, you might need to change to a machine/instance where the CPU RAM is sufficient to read your language model and produce the binary.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论