问题

我正在尝试重新对Paperless-ngx上的我的文档进行OCR，因为PDF中一些明显的文本丢失或未自动索引。我应该怎么做才能重新对特定文档进行OCR？

我正在使用Docker安装，所以我有以下容器正在运行：

paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1

我在GitHub页面上找到了以下讨论，但它并没有告诉我如何实际操作，只是"implemented"。

他们的文档中也提到了PAPERLESS_OCR_MODE=<mode>，但同样没有示例，我找不到在哪里应用这个设置。

谢谢

英文:

I'm trying to redo the OCR for my documents on Paperless-ngx, because some obvious text on the PDF's are missing or not indexed automatically. What should I do to redo OCR for specific documents ?

I'm using the docker installation so I have the following containers running:

paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1

I have found the following discussing on the GitHub page but it doesn't tell how to actually do it, just "implemented".

There are also mentions of PAPERLESS_OCR_MODE=<mode> in their documentation.
However again, no example and I couldn't find where to apply the setting.

Thank you

答案1

得分: 1

你可以通过运行以下命令来触发强制OCR：

docker exec -d -e "PAPERLESS_OCR_MODE=force" paperless-webserver-1 document_archiver --overwrite --document [这里输入文档ID]

英文:

You can trigger a force OCR by running this command:

docker exec -d  -e &quot;PAPERLESS_OCR_MODE=force&quot; paperless-webserver-1 document_archiver --overwrite --document [HERE_COMES_THE_DOCUMENT_ID]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Paperless-ngx 重新进行文档的OCR识别。

问题

答案1

无法在Google Colab上导入paddleocr库。

在Windows上安装新字体到Tesseract中。

Gosseract无法运行。

应用程序凭据在Google Cloud Vision API中不可用。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论