英文:
OCR tessdata directory is incorrect
问题
我一直在遵循这个教程尝试创建一个OCR,我已经复制并粘贴了所有必要的代码,并按照步骤进行了操作,但是当我运行OCRDemo.java时,我一直收到以下错误:
>Error opening data file ./eng.traineddata
请确保设置了TESSDATA_PREFIX环境变量,指向你的“tessdata”目录。
加载语言‘eng’失败。
Tesseract无法加载任何语言!
所以我猜问题可能是TESSDATA_PREFIX设置了错误的目录。目前它是“C:\CodeRepository\OCR\tessdata”,我已经通过进入文件资源管理器并复制粘贴目录来确认了这个目录。但我仍然不断收到这个错误消息。我也尝试过“OCR\tessdata”和“tessdata”,但都不起作用。有帮助吗?
这是包含TESSDATA_PREFIX的我的pom.xml代码:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>OCR</groupId>
<artifactId>OCR</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<TESSDATA_PREFIX>C:\CodeRepository\OCR\tessdata</TESSDATA_PREFIX>
</properties>
<dependencies>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.3.1</version>
</dependency>
</dependencies>
</project>
英文:
I've been following this tutorial for trying to create an OCR and I've copy and pasted all of the necessary code and followed the steps but I keep receiving this error when I run OCRDemo.java:
>Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
So I'm assuming the issue is that TESSDATA_PREFIX has the wrong directory. Currently it is "C:\CodeRepository\OCR\tessdata" and I got that directory and confirmed that directory by literally going into file explorer and copying and pasting it. But I keep getting this error message. I've also tried "OCR\tessdata", "tessdata" but none of them work. Help?
Here's my pom.xml code that has the TESSDATA_PREFIX:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>OCR</groupId>
<artifactId>OCR</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<TESSDATA_PREFIX>C:\CodeRepository\OCR\tessdata</TESSDATA_PREFIX>
</properties>
<dependencies>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.3.1</version>
</dependency>
</dependencies>
</project>
答案1
得分: 0
从给定的链接看,它似乎将读者指向不兼容的语言数据文件。请尝试使用 https://github.com/tesseract-ocr/tessdata_fast。
英文:
From the given link, it looks like it points the readers to incompatible language data files. Try https://github.com/tesseract-ocr/tessdata_fast.
答案2
得分: 0
ITesseract 实例 = 新的 Tesseract();
实例.setDatapath("C:\Users\Tux\Documents\tessdata");
这对我而言有效,无需设置环境变量。我只是将语言文件放在 'tessdata' 文件夹中。
英文:
ITesseract instance = new Tesseract();
instance.setDatapath("C:\\Users\\Tux\\Documents\\tessdata");
this worked for me without the need for setting environment variables. I just put the language file in the 'tessdata' folder
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论