Convert PDF to JPEG

huangapple go评论68阅读模式
英文:

Convert PDF to JPEG

问题

以下是您的代码的中文翻译部分:

我正在尝试创建一个将PDF的base64转换为JPEG图像的过程我遇到的问题是如果PDF以低于600dpi的分辨率扫描图像将无法正确呈现它将成为一个白色图像没有任何内容

这是我的代码

public String convertPDFtoJPEG(byte[] pdfData) {
    try {
        PDDocument document = PDDocument.load(pdfData);
        PDFRenderer renderer = new PDFRenderer(document);

        for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
            BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ImageIO.write(image, "JPEG", baos);

            baos.flush();

            String base64JPEG = Base64.getEncoder().encodeToString(baos.toByteArray());

            baos.close();
        }

        document.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return base64JPEG;
}

public byte[] readBase64FromFile(String filePath) {
    File file = new File(filePath);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
        String line;
        while ((line = reader.readLine()) != null) {
            baos.write(line.getBytes());
        }
    } catch (IOException e) {
        e.printStackTrace();
    } 
    return Base64.getDecoder().decode(baos.toByteArray());
}

这是我调用这些方法的方式

String filepath = "路径:\\到\\文件.txt";
byte[] pdfData = readBase64FromFile(filePath);

convertPDFtoJPEG(pdfData);

在file.txt中存储了PDF的base64数据

以上是您的代码的中文翻译部分,不包括您的问题部分。

英文:

I'm trying to create a process of converting base64 of pdf to base64 of jpeg image. The problem that I have is if pdf is scanned with less than 600dpi, image is not rendered correctly, it is white image without any content in it.

This is my code:

public String convertPDFtoJPEG(byte[] pdfData) {
try {
PDDocument document = PDDocument.load(pdfData);
PDFRenderer renderer = new PDFRenderer(document);
for (int pageIndex = 0; pageIndex &lt; document.getNumberOfPages(); pageIndex++) {
BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(image, &quot;JPEG&quot;, baos);
baos.flush();
String base64JPEG = Base64.getEncoder().encodeToString(baos.toByteArray());
baos.close();
}
document.close();
} catch (IOException e) {
e.printStackTrace();
}
return base64JPEG;
}
public byte[] readBase64FromFile(String filePath) {
File file = new File(filePath);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
String line;
while ((line = reader.readLine()) != null) {
baos.write(line.getBytes());
}
} catch (IOException e) {
e.printStackTrace();
} 
return Base64.getDecoder().decode(baos.toByteArray());
}

This is how I'm calling this methods:

String filepath = &quot;path:\\to\\file.txt&quot;
byte[] pdfData = readBase64FromFile(filePath);
convertPDFtoJPEG(pdfData);

In file.txt is base64 of pdf.

答案1

得分: 1

请确保您具备JPX和JBIG2图像格式的依赖项:

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.4.0</version>
</dependency>
<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
</dependency>

我还建议您使用 twelvemonkeys 库来处理TIFF和JPEG格式的图像。

英文:

As discussed in the comments, make sure you have the dependencies for JPX and JBIG2 image formats:

&lt;dependency&gt;
&lt;groupId&gt;com.github.jai-imageio&lt;/groupId&gt;
&lt;artifactId&gt;jai-imageio-jpeg2000&lt;/artifactId&gt;
&lt;version&gt;1.4.0&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;com.github.jai-imageio&lt;/groupId&gt;
&lt;artifactId&gt;jai-imageio-core&lt;/artifactId&gt;
&lt;version&gt;1.4.0&lt;/version&gt;
&lt;/dependency&gt;

I also recommend using the twelvemonkeys library for TIFF and JPEG.

答案2

得分: 0

以下是您要翻译的内容:

  1. 你的代码存在几个问题:

    1.1 你没有正确解码 base64 数据。你逐行读取文件并将每一行转换为字节数组,然后最后将整个内容传递给 Base64 解码器,将其转换回字符串。你应该改变方法如下:

    public static byte[] readBase64FromFile(String filePath) {
        File file = new File(filePath);
        StringBuilder base64StringBuilder = new StringBuilder();
    
        try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
            String line;
            while ((line = reader.readLine()) != null) {
                base64StringBuilder.append(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return Base64.getDecoder().decode(base64StringBuilder.toString());
    }
    

    或者直接将整个内容读取为二进制:

    public static byte[] readBinaryFromFile(String filePath) {
        Path path2Pdf = Paths.get(filePath);
        byte[] pdf = new byte[]{};
        try {
            pdf = Files.readAllBytes(path2Pdf);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return Base64.getDecoder().decode(pdf);
    }
    

    1.2 你的 convertPDFtoJPEG 方法不合理。你读取整个文件,但只返回最后一页(甚至可能是空白页)。你应该返回一个 base64 JPEG 页面的列表:

    public static List<String> convertPDFtoJPEG(byte[] pdfData) {
        List<String> listOfBase64Images = new ArrayList<>();
    
        try (PDDocument document = PDDocument.load(pdfData)){
            PDFRenderer renderer = new PDFRenderer(document);
    
            for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
                BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                ImageIO.write(image, "JPEG", baos);
                listOfBase64Images.add(Base64.getEncoder().encodeToString(baos.toByteArray()));
            }
        } 
        catch (IOException e) {
            e.printStackTrace();
        }
        return listOfBase64Images;
    }
    

    顺便提一下:在 ByteArrayOutputStream 上调用 close()flush() 不起作用。

英文:

There are several issues with your code:

  1. You are not decoding the base64 data correctly. You read the file line by line and convert (each line) to bytes. Then at the end you pass the whole thing to the Base64 decoder converting it back to string. You should change the method to:

    public static byte[] readBase64FromFile(String filePath) {
    File file = new File(filePath);
    StringBuilder base64StringBuilder = new StringBuilder();
    try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = reader.readLine()) != null) {
    base64StringBuilder.append(line);
    }
    } catch (IOException e) {
    e.printStackTrace();
    }
    return Base64.getDecoder().decode(base64StringBuilder.toString());
    }
    

    Or read the whole thing as a binary:

    public static byte[] readBinaryFromFile(String filePath) {
    Path path2Pdf = Paths.get(filePath);
    byte[] pdf = new byte[]{};
    try {
    pdf = Files.readAllBytes(path2Pdf);
    } catch (IOException e) {
    e.printStackTrace();
    }
    return Base64.getDecoder().decode(pdf);
    }
    
  2. Your convertPDFtoJPEG method doesn't make sense. You are reading the whole file but only return the last page (which might even be a white page). You should return a list of base64 jpeg pages:

    public static List&lt;String&gt; convertPDFtoJPEG(byte[] pdfData) {
    List&lt;String&gt; listOfBase64Images = new ArrayList&lt;&gt;();
    try (PDDocument document = PDDocument.load(pdfData)){
    PDFRenderer renderer = new PDFRenderer(document);
    for (int pageIndex = 0; pageIndex &lt; document.getNumberOfPages(); pageIndex++) {
    BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ImageIO.write(image, &quot;JPEG&quot;, baos);
    listOfBase64Images.add(Base64.getEncoder().encodeToString(baos.toByteArray()));
    }
    } 
    catch (IOException e) {
    e.printStackTrace();
    }
    return listOfBase64Images;
    

    }

And fyi: calling close() and flush() on a ByteArrayOutputStream doesn't do anything.

huangapple
  • 本文由 发表于 2023年5月25日 18:58:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331546.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定