OCR tessdata directory is incorrect

huangapple go评论79阅读模式
英文:

OCR tessdata directory is incorrect

问题

我一直在遵循这个教程尝试创建一个OCR,我已经复制并粘贴了所有必要的代码,并按照步骤进行了操作,但是当我运行OCRDemo.java时,我一直收到以下错误:
>Error opening data file ./eng.traineddata
请确保设置了TESSDATA_PREFIX环境变量,指向你的“tessdata”目录。
加载语言‘eng’失败。
Tesseract无法加载任何语言!

所以我猜问题可能是TESSDATA_PREFIX设置了错误的目录。目前它是“C:\CodeRepository\OCR\tessdata”,我已经通过进入文件资源管理器并复制粘贴目录来确认了这个目录。但我仍然不断收到这个错误消息。我也尝试过“OCR\tessdata”和“tessdata”,但都不起作用。有帮助吗?

这是包含TESSDATA_PREFIX的我的pom.xml代码:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>OCR</groupId>
    <artifactId>OCR</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <properties>
        <TESSDATA_PREFIX>C:\CodeRepository\OCR\tessdata</TESSDATA_PREFIX>
    </properties>
    <dependencies>
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>4.3.1</version>
        </dependency>
    </dependencies>
</project>
英文:

I've been following this tutorial for trying to create an OCR and I've copy and pasted all of the necessary code and followed the steps but I keep receiving this error when I run OCRDemo.java:
>Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

So I'm assuming the issue is that TESSDATA_PREFIX has the wrong directory. Currently it is "C:\CodeRepository\OCR\tessdata" and I got that directory and confirmed that directory by literally going into file explorer and copying and pasting it. But I keep getting this error message. I've also tried "OCR\tessdata", "tessdata" but none of them work. Help?

Here's my pom.xml code that has the TESSDATA_PREFIX:

    &lt;project xmlns=&quot;http://maven.apache.org/POM/4.0.0&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd&quot;&gt;


&lt;modelVersion&gt;4.0.0&lt;/modelVersion&gt;
  &lt;groupId&gt;OCR&lt;/groupId&gt;
  &lt;artifactId&gt;OCR&lt;/artifactId&gt;
  &lt;version&gt;0.0.1-SNAPSHOT&lt;/version&gt;
  &lt;properties&gt;
  	&lt;TESSDATA_PREFIX&gt;C:\CodeRepository\OCR\tessdata&lt;/TESSDATA_PREFIX&gt;
  &lt;/properties&gt;
  &lt;dependencies&gt;
  	&lt;dependency&gt;
  		&lt;groupId&gt;net.sourceforge.tess4j&lt;/groupId&gt;
  		&lt;artifactId&gt;tess4j&lt;/artifactId&gt;
  		&lt;version&gt;4.3.1&lt;/version&gt;
  	&lt;/dependency&gt;
  &lt;/dependencies&gt;
&lt;/project&gt;

答案1

得分: 0

从给定的链接看,它似乎将读者指向不兼容的语言数据文件。请尝试使用 https://github.com/tesseract-ocr/tessdata_fast

英文:

From the given link, it looks like it points the readers to incompatible language data files. Try https://github.com/tesseract-ocr/tessdata_fast.

答案2

得分: 0

ITesseract 实例 = 新的 Tesseract();
实例.setDatapath("C:\Users\Tux\Documents\tessdata");

这对我而言有效,无需设置环境变量。我只是将语言文件放在 'tessdata' 文件夹中。

英文:
 ITesseract instance = new Tesseract();          
        instance.setDatapath(&quot;C:\\Users\\Tux\\Documents\\tessdata&quot;); 

this worked for me without the need for setting environment variables. I just put the language file in the 'tessdata' folder

huangapple
  • 本文由 发表于 2020年9月22日 01:15:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/63997048.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定