2020年8月10日 18:15:53go评论93阅读模式

英文:

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

问题

请提供如何使用PDFBox实现此目标的建议。

我尝试了以下代码：

try {
    PDDocument document = PDDocument.load(new File(inputFilePath));
    PDFRenderer pdfRenderer = new PDFRenderer(document);

    for (int page = 0; page < document.getNumberOfPages(); ++page) { 
        BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
        ImageIOUtil.writeImage(bim, outputFilePath + "-" + (page+1) + ".png", 300);
    }

    document.close();
} catch (Exception e) {
    e.printStackTrace();
}

我附上了我得到的输出。

查看这张图片，我不想要内容：

我期望以下输出，请查看这张图片：

英文:

Please suggest how can i achieve this with pdfbox ?

I tried below code :

try {
	PDDocument document = PDDocument.load(new File(inputFilePath));
	PDFRenderer pdfRenderer = new PDFRenderer(document);
	
	for (int page = 0; page &lt; document.getNumberOfPages(); ++page)
	{ 
	    BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
	    ImageIOUtil.writeImage(bim, outputFilePath + &quot;-&quot; + (page+1) + &quot;.png&quot;, 300);
	}
	
	document.close();
} catch (Exception e) {
	e.printStackTrace();
}

I attached the output i got

See this pic i don't want content:

I am expecting below output, see this pic:

答案1

得分: 1

作为第一步，您可以从PDF中删除文本。如果您的PDF中的文本存储在页面内容流中（而不是在某些引用的形式XObjects或某些注释中），您可以像这样使用PdfContentStreamEditor，例如：

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    PdfContentStreamEditor identity = new PdfContentStreamEditor(document, page) {
        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
            String operatorString = operator.getName();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                return;
            }

            super.write(contentStreamWriter, operator, operands);
        }

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    identity.processPage(page);
}

（EditPageContent测试testRemoveTextDocument）

如果您希望将结果作为位图图像输出，您可以像以前一样呈现此文档。

英文:

As a first step you can remove the text from the PDF. If the text in your PDF is stored in page content streams (and not in some referenced form XObjects or some annotation), you can use the PdfContentStreamEditor from this answer, e.g. like this:

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    PdfContentStreamEditor identity = new PdfContentStreamEditor(document, page) {
        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List&lt;COSBase&gt; operands) throws IOException {
            String operatorString = operator.getName();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                return;
            }

            super.write(contentStreamWriter, operator, operands);
        }

        final List&lt;String&gt; TEXT_SHOWING_OPERATORS = Arrays.asList(&quot;Tj&quot;, &quot;&#39;&quot;, &quot;\&quot;&quot;, &quot;TJ&quot;);
    };
    identity.processPage(page);
}

(EditPageContent test testRemoveTextDocument)

If you want the result as a bitmap image, you can now render this document as you did before.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

问题

答案1

一个 MethodHandle 常量能否被使用以绕过访问控制？

如何将一个byte[]列表传递为可变参数参数？

为什么我的变量在Java中导致溢出？

无法访问 Lambda 函数中的环境变量。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论