Paperless-ngx 重新进行文档的OCR识别。

huangapple go评论119阅读模式
英文:

Paperless-ngx redo OCR for documents

问题

我正在尝试重新对Paperless-ngx上的我的文档进行OCR,因为PDF中一些明显的文本丢失或未自动索引。我应该怎么做才能重新对特定文档进行OCR?

我正在使用Docker安装,所以我有以下容器正在运行:

paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1

我在GitHub页面上找到了以下讨论,但它并没有告诉我如何实际操作,只是"implemented"。

他们的文档中也提到了PAPERLESS_OCR_MODE=<mode>,但同样没有示例,我找不到在哪里应用这个设置。

谢谢 Paperless-ngx 重新进行文档的OCR识别。

英文:

I'm trying to redo the OCR for my documents on Paperless-ngx, because some obvious text on the PDF's are missing or not indexed automatically. What should I do to redo OCR for specific documents ?

I'm using the docker installation so I have the following containers running:

paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1

I have found the following discussing on the GitHub page but it doesn't tell how to actually do it, just "implemented".

There are also mentions of PAPERLESS_OCR_MODE=&lt;mode&gt; in their documentation.
However again, no example and I couldn't find where to apply the setting.

Thank you Paperless-ngx 重新进行文档的OCR识别。

答案1

得分: 1

你可以通过运行以下命令来触发强制OCR:

docker exec -d -e "PAPERLESS_OCR_MODE=force" paperless-webserver-1 document_archiver --overwrite --document [这里输入文档ID]
英文:

You can trigger a force OCR by running this command:

docker exec -d  -e &quot;PAPERLESS_OCR_MODE=force&quot; paperless-webserver-1 document_archiver --overwrite --document [HERE_COMES_THE_DOCUMENT_ID]

huangapple
  • 本文由 发表于 2023年3月21日 00:51:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793124.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定