英文:
I Want to Convert PDF TO IMAGE but I only want single output image which contain all the images and Vector graphics only. I do not want text
问题
请提供如何使用PDFBox实现此目标的建议。
我尝试了以下代码:
try {
PDDocument document = PDDocument.load(new File(inputFilePath));
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, outputFilePath + "-" + (page+1) + ".png", 300);
}
document.close();
} catch (Exception e) {
e.printStackTrace();
}
我附上了我得到的输出。
查看这张图片,我不想要内容:
我期望以下输出,请查看这张图片:
英文:
Please suggest how can i achieve this with pdfbox ?
I tried below code :
try {
PDDocument document = PDDocument.load(new File(inputFilePath));
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page)
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, outputFilePath + "-" + (page+1) + ".png", 300);
}
document.close();
} catch (Exception e) {
e.printStackTrace();
}
I attached the output i got
See this pic i don't want content:
I am expecting below output, see this pic:
答案1
得分: 1
作为第一步,您可以从PDF中删除文本。如果您的PDF中的文本存储在页面内容流中(而不是在某些引用的形式XObjects或某些注释中),您可以像这样使用PdfContentStreamEditor
,例如:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor identity = new PdfContentStreamEditor(document, page) {
@Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
return;
}
super.write(contentStreamWriter, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
identity.processPage(page);
}
(EditPageContent测试testRemoveTextDocument
)
如果您希望将结果作为位图图像输出,您可以像以前一样呈现此文档。
英文:
As a first step you can remove the text from the PDF. If the text in your PDF is stored in page content streams (and not in some referenced form XObjects or some annotation), you can use the PdfContentStreamEditor
from this answer, e.g. like this:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor identity = new PdfContentStreamEditor(document, page) {
@Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
return;
}
super.write(contentStreamWriter, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
identity.processPage(page);
}
(EditPageContent test testRemoveTextDocument
)
If you want the result as a bitmap image, you can now render this document as you did before.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论