I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

huangapple go评论81阅读模式
英文:

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

问题

请提供如何使用PDFBox实现此目标的建议。

我尝试了以下代码:

try {
    PDDocument document = PDDocument.load(new File(inputFilePath));
    PDFRenderer pdfRenderer = new PDFRenderer(document);

    for (int page = 0; page < document.getNumberOfPages(); ++page) { 
        BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
        ImageIOUtil.writeImage(bim, outputFilePath + "-" + (page+1) + ".png", 300);
    }

    document.close();
} catch (Exception e) {
    e.printStackTrace();
}

我附上了我得到的输出。

查看这张图片,我不想要内容:

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

我期望以下输出,请查看这张图片:

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

英文:

Please suggest how can i achieve this with pdfbox ?

I tried below code :

try {
	PDDocument document = PDDocument.load(new File(inputFilePath));
	PDFRenderer pdfRenderer = new PDFRenderer(document);
	
	for (int page = 0; page &lt; document.getNumberOfPages(); ++page)
	{ 
	    BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
	    ImageIOUtil.writeImage(bim, outputFilePath + &quot;-&quot; + (page+1) + &quot;.png&quot;, 300);
	}
	
	document.close();
} catch (Exception e) {
	e.printStackTrace();
}

I attached the output i got

See this pic i don't want content:

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

I am expecting below output, see this pic:

I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text

答案1

得分: 1

作为第一步,您可以从PDF中删除文本。如果您的PDF中的文本存储在页面内容流中(而不是在某些引用的形式XObjects或某些注释中),您可以像这样使用PdfContentStreamEditor,例如:

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    PdfContentStreamEditor identity = new PdfContentStreamEditor(document, page) {
        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
            String operatorString = operator.getName();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                return;
            }

            super.write(contentStreamWriter, operator, operands);
        }

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    identity.processPage(page);
}

EditPageContent测试testRemoveTextDocument

如果您希望将结果作为位图图像输出,您可以像以前一样呈现此文档。

英文:

As a first step you can remove the text from the PDF. If the text in your PDF is stored in page content streams (and not in some referenced form XObjects or some annotation), you can use the PdfContentStreamEditor from this answer, e.g. like this:

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    PdfContentStreamEditor identity = new PdfContentStreamEditor(document, page) {
        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List&lt;COSBase&gt; operands) throws IOException {
            String operatorString = operator.getName();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                return;
            }

            super.write(contentStreamWriter, operator, operands);
        }

        final List&lt;String&gt; TEXT_SHOWING_OPERATORS = Arrays.asList(&quot;Tj&quot;, &quot;&#39;&quot;, &quot;\&quot;&quot;, &quot;TJ&quot;);
    };
    identity.processPage(page);
}

(EditPageContent test testRemoveTextDocument)

If you want the result as a bitmap image, you can now render this document as you did before.

huangapple
  • 本文由 发表于 2020年8月10日 18:15:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/63338236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定