英文:
Paperless-ngx redo OCR for documents
问题
我正在尝试重新对Paperless-ngx上的我的文档进行OCR,因为PDF中一些明显的文本丢失或未自动索引。我应该怎么做才能重新对特定文档进行OCR?
我正在使用Docker安装,所以我有以下容器正在运行:
paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1
我在GitHub页面上找到了以下讨论,但它并没有告诉我如何实际操作,只是"implemented"。
他们的文档中也提到了PAPERLESS_OCR_MODE=<mode>
,但同样没有示例,我找不到在哪里应用这个设置。
谢谢
英文:
I'm trying to redo the OCR for my documents on Paperless-ngx, because some obvious text on the PDF's are missing or not indexed automatically. What should I do to redo OCR for specific documents ?
I'm using the docker installation so I have the following containers running:
paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1
I have found the following discussing on the GitHub page but it doesn't tell how to actually do it, just "implemented".
There are also mentions of PAPERLESS_OCR_MODE=<mode>
in their documentation.
However again, no example and I couldn't find where to apply the setting.
Thank you
答案1
得分: 1
你可以通过运行以下命令来触发强制OCR:
docker exec -d -e "PAPERLESS_OCR_MODE=force" paperless-webserver-1 document_archiver --overwrite --document [这里输入文档ID]
英文:
You can trigger a force OCR by running this command:
docker exec -d -e "PAPERLESS_OCR_MODE=force" paperless-webserver-1 document_archiver --overwrite --document [HERE_COMES_THE_DOCUMENT_ID]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论