在使用Java中的pdfbox解析PDF文件时出现错误。

huangapple go评论57阅读模式
英文:

Getting errors while using pdfbox to parse pdf files in Java

问题

请在使用Java中的pdfbox解析PDF文件时给我建议,因为我遇到了错误。已经导入了所有必要的库,但仍然出现错误。

请告诉我我做错了什么以及可能的解决方案。

以下是我的Java代码 / 查看下面的错误和堆栈跟踪

错误:(3210java类型的非法开始
错误:(3213java类型的非法开始
错误:(3214java应输入')'
错误:(3218java应输入';'
错误:(3219java无效的方法声明需要返回类型
错误:(3227java应输入';'
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.io.RandomAccessRead;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.beans.XMLDecoder;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

public class PdfReader {
    private XMLDecoder cosDoc = null;

    private XMLDecoder pdDoc = null;
    static File file = new File("data/javaPDF.pdf");

    private static PDFParser parser = null;

    public static void main(String args[]) {
        PDFTextParser pdf = new PDFTextParser();
        //print out results
        System.out.println(pdf.getParsedText());
        //
    }

    private static class PDFTextParser {

        if(!file.isFile())

        {
            String fileName = null;
            System.err.println("File " + fileName + " does not exist.");
        }
        //Set up instance of PDF parser


        {
            try {
                parser = new PDFParser((RandomAccessRead) new FileInputStream(file));
            } catch (IOException e) {
                e.printStackTrace();
            }
        }


    }
    //-------------------------------

    public static String getParsedText() {
        PDDocument pdDoc = null;
        COSDocument cosDoc = null;

        String parsedText = null;
        try {
            parser.parse();
        } catch (IOException e1) {
            e1.printStackTrace();
        }
        try {
            cosDoc = parser.getDocument();
        } catch (IOException e1) {
            e1.printStackTrace();
        }
        pdDoc = new PDDocument(cosDoc);

        return parsedText;
    }
//            System.err.println("An exception occured in parsing the PDF Document." + e.getMessage());

    {
        if (cosDoc != null) cosDoc.close();
        if (pdDoc != null) pdDoc.close();
    }
}
英文:

Please advice me as I am getting errors while using pdfbox to parse pdf files in Java. All the necessary libraries are imported and I am still getting errors.

Please tell me what I am doing wrong and the possible solution

See my Java code below / see the errors and stack trace below

Error:(32, 10) java: illegal start of type
Error:(32, 13) java: illegal start of type
Error:(32, 14) java: ')' expected
Error:(32, 18) java: ';' expected
Error:(32, 19) java: invalid method declaration; return type required
Error:(32, 27) java: ';' expected
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.io.RandomAccessRead;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.beans.XMLDecoder;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class PdfReader {
private XMLDecoder cosDoc = null;
private XMLDecoder pdDoc = null;
static File file = new File("data/javaPDF.pdf");
private static PDFParser parser = null;
public static void main(String args[]) {
PDFTextParser pdf = new PDFTextParser();
//print out results
System.out.println(pdf.getParsedText());
//
}
private static class PDFTextParser {
if(!file.isFile())
{
String fileName = null;
System.err.println("File " + fileName + " does not exist.");
}
//Set up instance of PDF parser
{
try {
parser = new PDFParser((RandomAccessRead) new FileInputStream(file));
} catch (IOException e) {
e.printStackTrace();
}
}
}
//-------------------------------
public static String getParsedText() {
PDDocument pdDoc = null;
COSDocument cosDoc = null;
String parsedText = null;
try {
parser.parse();
} catch (IOException e1) {
e1.printStackTrace();
}
try {
cosDoc = parser.getDocument();
} catch (IOException e1) {
e1.printStackTrace();
}
pdDoc = new PDDocument(cosDoc);
return parsedText;
}
//            System.err.println("An exception occured in parsing the PDF Document." + e.getMessage());
{
if (cosDoc != null) cosDoc.close();
if (pdDoc != null) pdDoc.close();
}
}

答案1

得分: 0

感谢大家,问题已解决。

以下是解决方案:

package com.mypackage.util;

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.text.PDFTextStripper;

public class ExtractTextSimple
{
    private ExtractTextSimple()
    {
        // 示例类不应该被实例化
    }

    public static void main(String[] args) throws IOException
    {
        PDDocument document = PDDocument.load(new File("data/javaPDF.pdf"));

        System.out.println("页数 >>" + document.getNumberOfPages());
        AccessPermission ap = document.getCurrentAccessPermission();
        if (!ap.canExtractContent())
        {
            throw new IOException("您无权提取文本");
        }

        PDFTextStripper stripper = new PDFTextStripper();

        stripper.setSortByPosition(true);

        for (int p = 1; p <= document.getNumberOfPages(); ++p)
        {
            // 设置要提取的页面间隔。如果不设置,将提取所有页面。
            stripper.setStartPage(p);
            stripper.setEndPage(p);

            // 让魔法发生
            String text = stripper.getText(document);
            System.out.println("文本 >>" + text);
            // 以标题形式输出一些漂亮的内容
            String pageStr = String.format("第 %d 页:", p);
            System.out.println(pageStr);
            for (int i = 0; i < pageStr.length(); ++i)
            {
                System.out.print("-");
            }
            System.out.println();
            System.out.println(text.trim());
            System.out.println();
        }
    }
}
英文:

Thanks so much guys, it's been resolved.

Here's the solution:

package com.mypackage.util;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.text.PDFTextStripper;
public class ExtractTextSimple
{
private ExtractTextSimple()
{
// example class should not be instantiated
}
public static void main(String[] args) throws IOException
{
PDDocument document = PDDocument.load(new File(&quot;data/javaPDF.pdf&quot;));
System.out.println(&quot;No of pages &gt;&gt;&quot;+ document.getNumberOfPages());
AccessPermission ap = document.getCurrentAccessPermission();
if (!ap.canExtractContent())
{
throw new IOException(&quot;You do not have permission to extract text&quot;);
}
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
for (int p = 1; p &lt;= document.getNumberOfPages(); ++p)
{
// Set the page interval to extract. If you don&#39;t, then all pages would be extracted.
stripper.setStartPage(p);
stripper.setEndPage(p);
// let the magic happen
String text = stripper.getText(document);
System.out.println(&quot;Text &gt;&gt;&quot; + text);
// do some nice output with a header
String pageStr = String.format(&quot;page %d:&quot;, p);
System.out.println(pageStr);
for (int i = 0; i &lt; pageStr.length(); ++i)
{
System.out.print(&quot;-&quot;);
}
System.out.println();
System.out.println(text.trim());
System.out.println();
}
}
}

huangapple
  • 本文由 发表于 2020年8月19日 06:17:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/63477381.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定