2023年5月25日 18:58:42go评论68阅读模式

英文:

Convert PDF to JPEG

问题

以下是您的代码的中文翻译部分：

我正在尝试创建一个将PDF的base64转换为JPEG图像的过程。我遇到的问题是，如果PDF以低于600dpi的分辨率扫描，图像将无法正确呈现，它将成为一个白色图像，没有任何内容。

这是我的代码：

public String convertPDFtoJPEG(byte[] pdfData) {
    try {
        PDDocument document = PDDocument.load(pdfData);
        PDFRenderer renderer = new PDFRenderer(document);

        for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
            BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ImageIO.write(image, "JPEG", baos);

            baos.flush();

            String base64JPEG = Base64.getEncoder().encodeToString(baos.toByteArray());

            baos.close();
        }

        document.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return base64JPEG;
}

public byte[] readBase64FromFile(String filePath) {
    File file = new File(filePath);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
        String line;
        while ((line = reader.readLine()) != null) {
            baos.write(line.getBytes());
        }
    } catch (IOException e) {
        e.printStackTrace();
    } 
    return Base64.getDecoder().decode(baos.toByteArray());
}

这是我调用这些方法的方式：

String filepath = "路径:\\到\\文件.txt";
byte[] pdfData = readBase64FromFile(filePath);

convertPDFtoJPEG(pdfData);

在file.txt中存储了PDF的base64数据。

以上是您的代码的中文翻译部分，不包括您的问题部分。

英文:

I'm trying to create a process of converting base64 of pdf to base64 of jpeg image. The problem that I have is if pdf is scanned with less than 600dpi, image is not rendered correctly, it is white image without any content in it.

This is my code:

public String convertPDFtoJPEG(byte[] pdfData) {
try {
PDDocument document = PDDocument.load(pdfData);
PDFRenderer renderer = new PDFRenderer(document);
for (int pageIndex = 0; pageIndex &lt; document.getNumberOfPages(); pageIndex++) {
BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(image, &quot;JPEG&quot;, baos);
baos.flush();
String base64JPEG = Base64.getEncoder().encodeToString(baos.toByteArray());
baos.close();
}
document.close();
} catch (IOException e) {
e.printStackTrace();
}
return base64JPEG;
}
public byte[] readBase64FromFile(String filePath) {
File file = new File(filePath);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
String line;
while ((line = reader.readLine()) != null) {
baos.write(line.getBytes());
}
} catch (IOException e) {
e.printStackTrace();
} 
return Base64.getDecoder().decode(baos.toByteArray());
}

This is how I'm calling this methods:

String filepath = &quot;path:\\to\\file.txt&quot;
byte[] pdfData = readBase64FromFile(filePath);
convertPDFtoJPEG(pdfData);

In file.txt is base64 of pdf.

答案1

得分: 1

请确保您具备JPX和JBIG2图像格式的依赖项：

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.4.0</version>
</dependency>
<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
</dependency>

我还建议您使用 twelvemonkeys 库来处理TIFF和JPEG格式的图像。

英文:

As discussed in the comments, make sure you have the dependencies for JPX and JBIG2 image formats:

&lt;dependency&gt;
&lt;groupId&gt;com.github.jai-imageio&lt;/groupId&gt;
&lt;artifactId&gt;jai-imageio-jpeg2000&lt;/artifactId&gt;
&lt;version&gt;1.4.0&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;com.github.jai-imageio&lt;/groupId&gt;
&lt;artifactId&gt;jai-imageio-core&lt;/artifactId&gt;
&lt;version&gt;1.4.0&lt;/version&gt;
&lt;/dependency&gt;

I also recommend using the twelvemonkeys library for TIFF and JPEG.

答案2

得分: 0

以下是您要翻译的内容：

你的代码存在几个问题：

1.1 你没有正确解码 base64 数据。你逐行读取文件并将每一行转换为字节数组，然后最后将整个内容传递给 Base64 解码器，将其转换回字符串。你应该改变方法如下：

public static byte[] readBase64FromFile(String filePath) {
    File file = new File(filePath);
    StringBuilder base64StringBuilder = new StringBuilder();

    try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
        String line;
        while ((line = reader.readLine()) != null) {
            base64StringBuilder.append(line);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return Base64.getDecoder().decode(base64StringBuilder.toString());
}

或者直接将整个内容读取为二进制：

public static byte[] readBinaryFromFile(String filePath) {
    Path path2Pdf = Paths.get(filePath);
    byte[] pdf = new byte[]{};
    try {
        pdf = Files.readAllBytes(path2Pdf);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return Base64.getDecoder().decode(pdf);
}

1.2 你的 convertPDFtoJPEG 方法不合理。你读取整个文件，但只返回最后一页（甚至可能是空白页）。你应该返回一个 base64 JPEG 页面的列表：

public static List<String> convertPDFtoJPEG(byte[] pdfData) {
    List<String> listOfBase64Images = new ArrayList<>();

    try (PDDocument document = PDDocument.load(pdfData)){
        PDFRenderer renderer = new PDFRenderer(document);

        for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
            BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ImageIO.write(image, "JPEG", baos);
            listOfBase64Images.add(Base64.getEncoder().encodeToString(baos.toByteArray()));
        }
    } 
    catch (IOException e) {
        e.printStackTrace();
    }
    return listOfBase64Images;
}

顺便提一下：在 ByteArrayOutputStream 上调用 close() 和 flush() 不起作用。

英文:

There are several issues with your code:

You are not decoding the base64 data correctly. You read the file line by line and convert (each line) to bytes. Then at the end you pass the whole thing to the Base64 decoder converting it back to string. You should change the method to:

public static byte[] readBase64FromFile(String filePath) {
File file = new File(filePath);
StringBuilder base64StringBuilder = new StringBuilder();
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
String line;
while ((line = reader.readLine()) != null) {
base64StringBuilder.append(line);
}
} catch (IOException e) {
e.printStackTrace();
}
return Base64.getDecoder().decode(base64StringBuilder.toString());
}

Or read the whole thing as a binary:

public static byte[] readBinaryFromFile(String filePath) {
Path path2Pdf = Paths.get(filePath);
byte[] pdf = new byte[]{};
try {
pdf = Files.readAllBytes(path2Pdf);
} catch (IOException e) {
e.printStackTrace();
}
return Base64.getDecoder().decode(pdf);
}

Your convertPDFtoJPEG method doesn't make sense. You are reading the whole file but only return the last page (which might even be a white page). You should return a list of base64 jpeg pages:

public static List&lt;String&gt; convertPDFtoJPEG(byte[] pdfData) {
List&lt;String&gt; listOfBase64Images = new ArrayList&lt;&gt;();
try (PDDocument document = PDDocument.load(pdfData)){
PDFRenderer renderer = new PDFRenderer(document);
for (int pageIndex = 0; pageIndex &lt; document.getNumberOfPages(); pageIndex++) {
BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(image, &quot;JPEG&quot;, baos);
listOfBase64Images.add(Base64.getEncoder().encodeToString(baos.toByteArray()));
}
} 
catch (IOException e) {
e.printStackTrace();
}
return listOfBase64Images;

}

And fyi: calling close() and flush() on a ByteArrayOutputStream doesn't do anything.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Convert PDF to JPEG

问题

答案1

答案2

使用Java流根据日期间隔对列表数据进行分组，并对金额字段求和。

java- 未使用事件调度线程？

Java泛型类型转换，例如<String, String>转换为<Object, Object>。

在Java中，我如何判断一个PDF文件是否包含JBIG2图像？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论