如何减小 PDF 中 PNG 图像的大小(压缩 PDF 中的 PNG)。

huangapple go评论116阅读模式
英文:

how to reduce the size of png image in pdf (compress png in pdf)

问题

我想通过用低分辨率图像替换高分辨率图像来减小 PDF 文件的大小。为了完成这个问题,我需要执行以下步骤:

  1. 从 PDF 中提取图像(流)。
  2. 压缩图像。
  3. 使用压缩后的图像替换 PDF 中的图像(流)。

当我提取 PNG 图像并替换它们时,透明背景变为黑色背景。我从 PDF 中提取图像以找出原因。PDF 使用一种非常奇怪的方式来保存 PNG 图像的流。因此,如果我尝试从 PDF 中提取 PNG 图像,我将获得两个不同的图像:一个 8 位颜色图像和一个 24 位颜色图像。

...
1 0 obj
<</Type/XObject/Subtype/Image/Width 1920/Height 1035/Length 24720/ColorSpace/DeviceGray/BitsPerComponent 8/Filter/FlateDecode>>stream
...
endstream
endobj
2 0 obj
<</Type/XObject/Subtype/Image/Width 1920/Height 1035/SMask 1 0 R/Length 47751/ColorSpace[/CalRGB<</Gamma[2.2 2.2 2.2]/Matrix[0.41239 0.21264 0.01933 0.35758 0.71517 0.11919 0.18045 0.07218 0.9504]/WhitePoint[0.95043 1 1.09]>>]/Intent/Perceptual/BitsPerComponent 8/Filter/FlateDecode>>stream
...
endstream
...

原始图像(32 位颜色图像,带有透明背景):
如何减小 PDF 中 PNG 图像的大小(压缩 PDF 中的 PNG)。

一个 8 位颜色图像:
如何减小 PDF 中 PNG 图像的大小(压缩 PDF 中的 PNG)。

一个 24 位颜色图像:
如何减小 PDF 中 PNG 图像的大小(压缩 PDF 中的 PNG)。

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.12</version>
</dependency>
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.16</version>
</dependency>

ImageExtractor 将帮助您从 PDF 文件中提取图像。

public class ImageExtractor {

    private static final Logger log = LoggerFactory.getLogger(ImageExtractor.class);

    public void extract(File pdf, File imageDir) throws IOException {
        if(!imageDir.exists()) {
            imageDir.mkdirs();
        }
        PDDocument document = PDDocument.load(pdf);
        PDPageTree list = document.getPages();
        System.out.println("PDPageTree#count: " + list.getCount());
        int pageIndex = 1;
        for (PDPage page : list) {
            PDResources pdResources = page.getResources();
            System.out.println(pdResources.toString());
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println("PDResources[" + pageIndex + "]#COSName: " + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println("PDResources[" + pageIndex + "]#PDXObject: " + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    File file = new File(imageDir, pageIndex + "-" + System.nanoTime() + "." + img.getSuffix());
                    ImageIO.write(((PDImageXObject)o).getImage(), img.getSuffix(), file);
                }
            }
            pageIndex ++;
        }
        log.info("Images have been extracted successfully! Check your images folder.");
    }
}

ReplaceHighResolutionImage 是我用来减小 PDF 大小的代码。

package io.gitlab.donespeak.tutorial.pdf.reducesize.itext;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfNumber;
import com.itextpdf.text.pdf.PdfObject;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.PdfStream;
import com.itextpdf.text.pdf.parser.PdfImageObject;
import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.ImageCompressor;
import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.SimpleCompress;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ReplaceHighResolutionImage {

    private ImageCompressor compressor;
    private double quality;
    private double scale;

    public ReplaceHighResolutionImage(double quality, double scale) {
        this.compressor = new SimpleCompress();
        this.quality = quality;
        this.scale = scale;
    }

    public ReplaceHighResolutionImage(double quality, double scale, ImageCompressor compressor) {
        this.compressor = compressor;
        this.quality = quality;
        this.scale = scale;
    }

    public void replace(File pdf, File output) throws IOException, DocumentException {
        PdfReader reader = new PdfReader(new FileInputStream(pdf));
        int n = reader.getXrefSize();
        PdfObject object;
        PRStream stream;

        for (int i = 0; i < n; i++) {

            object = reader.getPdfObject(i);
            stream = findImageStream(object);
            if (stream == null) {
                continue;
            }
            PdfImageObject pdfImageObject = new PdfImageObject(stream);
            BufferedImage bi = pdfImageObject.getBufferedImage();
            if (bi == null) {
                continue;
            }
            System.out.println("PdfReader#Xref: " + i + "," + pdfImageObject.getFileType());
            BufferedImage resultImage = compressor.compress(bi, pdfImageObject.getFileType(), quality, scale);
            replaceImage(stream, resultImage);
        }

        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(output));
        // furtherCompress(reader

<details>
<summary>英文:</summary>

I want to reduce the size of pdf file by replacing the high-resolution image with a lower-resolution image. To complete the issue, I have to:

1. extract the images(streams) from pdf
2. compress the images
3. replace the images(streams) in the pdf with compressed images

When I extract png images and replace them, the transparent background changes to a black background. I extract the images from the pdf to figure out the reason. There is something very strange that pdf uses to stream to save a png. So if I attempt to extract a png image from a pdf, I will get two different images: an 8-bit color image and a 24-bit color image.

...
1 0 obj
<</Type/XObject/Subtype/Image/Width 1920/Height 1035/Length 24720/ColorSpace/DeviceGray/BitsPerComponent 8/Filter/FlateDecode>>stream
...
endstream
endobj
2 0 obj
<</Type/XObject/Subtype/Image/Width 1920/Height 1035/SMask 1 0 R/Length 47751/ColorSpace[/CalRGB<</Gamma[2.2 2.2 2.2]/Matrix[0.41239 0.21264 0.01933 0.35758 0.71517 0.11919 0.18045 0.07218 0.9504]/WhitePoint[0.95043 1 1.09]>>]/Intent/Perceptual/BitsPerComponent 8/Filter/FlateDecode>>stream
...
endstream
...


Original image(32-bit color image with a transparent background):  
![original image](https://i.stack.imgur.com/OXpND.png)  
An 8-bit color image: 
![8-bit color](https://i.stack.imgur.com/OA7SG.png)  
An 24-bit color image:   
![24-bit color](https://i.stack.imgur.com/eRKCN.png)
```xml
&lt;dependency&gt;
&lt;groupId&gt;com.itextpdf&lt;/groupId&gt;
&lt;artifactId&gt;itextpdf&lt;/artifactId&gt;
&lt;version&gt;5.5.12&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.pdfbox&lt;/groupId&gt;
&lt;artifactId&gt;pdfbox&lt;/artifactId&gt;
&lt;version&gt;2.0.16&lt;/version&gt;
&lt;/dependency&gt;

ImageExtractor will help you to extract images from the Pdf file.

public class ImageExtractor {

    private static final Logger log = LoggerFactory.getLogger(ImageExtractor.class);

    public void extract(File pdf, File imageDir) throws IOException {
        if(!imageDir.exists()) {
            imageDir.mkdirs();
        }
        PDDocument document = PDDocument.load(pdf);
        PDPageTree list = document.getPages();
        System.out.println(&quot;PDPageTree#count: &quot; + list.getCount());
        int pageIndex = 1;
        for (PDPage page : list) {
            PDResources pdResources = page.getResources();
            System.out.println(pdResources.toString());
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println(&quot;PDResources[&quot; + pageIndex + &quot;]#COSName: &quot; + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println(&quot;PDResources[&quot; + pageIndex + &quot;]#PDXObject: &quot; + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    File file = new File(imageDir, pageIndex + &quot;-&quot; + System.nanoTime() + &quot;.&quot; + img.getSuffix());
                    ImageIO.write(((PDImageXObject)o).getImage(), img.getSuffix(), file);
                }
            }
            pageIndex ++;
        }
        log.info(&quot;Images have been extracted successfully! Check your images folder.&quot;);
    }
}

ReplaceHightResolutionImage is the code I use to reduce the size of pdf.

package io.gitlab.donespeak.tutorial.pdf.reducesize.itext;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfNumber;
import com.itextpdf.text.pdf.PdfObject;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.PdfStream;
import com.itextpdf.text.pdf.parser.PdfImageObject;
import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.ImageCompressor;
import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.SimpleCompress;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ReplaceHightResolutionImage {

    private ImageCompressor compressor;
    private double quality;
    private double scale;

    public ReplaceHightResolutionImage(double quality, double scale) {
        this.compressor = new SimpleCompress();
        this.quality = quality;
        this.scale = scale;
    }

    public ReplaceHightResolutionImage(double quality, double scale, ImageCompressor compressor) {
        this.compressor = compressor;
        this.quality = quality;
        this.scale = scale;
    }

    public void replace(File pdf, File output) throws IOException, DocumentException {
        PdfReader reader = new PdfReader(new FileInputStream(pdf));
        int n = reader.getXrefSize();
        PdfObject object;
        PRStream stream;

        for (int i = 0; i &lt; n; i++) {

            object = reader.getPdfObject(i);
            stream = findImageStream(object);
            if (stream == null) {
                continue;
            }
            PdfImageObject pdfImageObject = new PdfImageObject(stream);
            BufferedImage bi = pdfImageObject.getBufferedImage();
            if (bi == null) {
                continue;
            }
            System.out.println(&quot;PdfReader#Xref: &quot; + i + &quot;,&quot; + pdfImageObject.getFileType());
            BufferedImage resultImage = compressor.compress(bi, pdfImageObject.getFileType(), quality, scale);
            replaceImage(stream, resultImage);
        }

        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(output));
        // furtherCompress(reader, stamper);
        stamper.close();
    }

    private void furtherCompress(PdfReader reader, PdfStamper stamper) throws DocumentException {
        reader.removeFields();
        reader.removeUnusedObjects();
        stamper.setFullCompression();
        stamper.getWriter().setCompressionLevel(PdfStream.DEFAULT_COMPRESSION);
    }

    private PRStream findImageStream(PdfObject object) {
        PRStream stream;
        if (object == null || !object.isStream()) {
            return null;
        }
        stream = (PRStream)object;
        System.out.println(stream.getAsName(PdfName.SUBTYPE));
        if (!PdfName.IMAGE.equals(stream.getAsName(PdfName.SUBTYPE))) {
            // not jpg or png
            return null;
        }
        PdfName pdfName = stream.getAsName(PdfName.FILTER);
        if (!PdfName.DCTDECODE.equals(pdfName) &amp;&amp; !PdfName.FLATEDECODE.equals(pdfName)) {
            return null;
        }
        // if (PdfName.DCTDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.JPG.getFileExtension();
        // } else if (PdfName.JPXDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.JP2.getFileExtension();
        // } else if (PdfName.FLATEDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.PNG.getFileExtension();
        // } else if (PdfName.LZWDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.CCITT.getFileExtension();
        // }
        return stream;
    }

    private void replaceImage(PRStream stream, BufferedImage resultImage) throws IOException {

        ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
        ImageIO.write(resultImage, &quot;JPG&quot;, imgBytes);

        stream.clear();
        stream.setData(imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION);
        stream.put(PdfName.TYPE, PdfName.XOBJECT);
        stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
        stream.put(PdfName.FILTER, PdfName.DCTDECODE);
        stream.put(PdfName.WIDTH, new PdfNumber(resultImage.getWidth()));
        stream.put(PdfName.HEIGHT, new PdfNumber(resultImage.getHeight()));
        stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
        stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
    }
}
package io.gitlab.donespeak.tutorial.pdf.reducesize.itext;

public class ThumbnailatorCompressor implements ImageCompressor {

    @Override
    public BufferedImage compress(BufferedImage image, String imageFormat, double quality, double scale) throws IOException {
        System.out.println(&quot;ThumbnailatorCompressor#type: &quot; + image.getType());
        // int imageType = &quot;png&quot;.equalsIgnoreCase(imageFormat)? BufferedImage.TYPE_INT_ARGB: image.getType();
        BufferedImage thumbnail = Thumbnails.of(image)
            .imageType(image.getType())
            .scale(scale)
            .outputQuality(quality)
            // .outputFormat(imageFormat)
            .useOriginalFormat()
            .asBufferedImage();

        return thumbnail;
    }
}
public class ReplaceHightResolutionImageTest {

    @Test
    public void reduceWithThumbnailatorCompressor() throws IOException, DocumentException {
        double quality = 1d;
        double scale = 0.6d;
        File pdf = new File(&quot;pdf/asset/horse.pdf&quot;);
        File output = new File(&quot;pdf/target/output&quot;, &quot;replaced-&quot; + quality + &quot;-&quot; + scale);
        ReplaceHightResolutionImage replacer = new ReplaceHightResolutionImage(quality, scale, new SimpleCompress());
        replacer.replace(pdf, output);
    }
}

答案1

得分: 1

以下是翻译好的部分:

这是一个可行但不够好的答案。它对jpg和png进行了很好的压缩。唯一的不足之处是,如果您在许多页面中重复使用图像,它将将每个图像引用视为单独的流,并生成一个新的流来替代图像引用,这可能会导致较大的文件大小。

package io.gitlab.donespeak.tutorial.pdf.reducesize;

import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.ThumbnailatorCompressor;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class RemoveAllImageFromPdf {

    public static void extractImages(File input, File imageDir) throws IOException {
        if(imageDir.exists()) {
            imageDir.delete();
        }
        imageDir.mkdirs();
        PDDocument document = PDDocument.load(input);
        int pageIndex = 1;
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        for (PDPage page : catalog.getPages()) {
            PDResources pdResources = page.getResources();
            System.out.println(pdResources.toString());
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println("PDResources[" + pageIndex + "]#COSName: " + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println("PDResources[" + pageIndex + "]#PDXObject: " + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    System.out.println(img.getSuffix() + "-" + img.getBitsPerComponent() + "-" + img.getColorSpace());
                    File file = new File(imageDir, pageIndex + "-" + c.getName() + "-" + img.getColorSpace() + "-" + System.nanoTime() + "." + img.getSuffix());
                    ImageIO.write(((PDImageXObject)o).getImage(), img.getSuffix(), file);
                }
            }
            pageIndex ++;
        }
        // document.save(output);
    }

    /**
     *
     * @param input
     * @param output
     * @throws IOException
     */
    public static void compress(File input, File output) throws IOException {
        if(!output.getParentFile().exists()) {
            output.getParentFile().mkdirs();
        }
        ThumbnailatorCompressor compressor = new ThumbnailatorCompressor();
        PDDocument document = PDDocument.load(input);
        int pageIndex = 1;
        PDDocumentCatalog catalog = document.getDocumentCatalog();

        for (PDPage page : catalog.getPages()) {
            PDResources pdResources = page.getResources();
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println("PDResources[" + pageIndex + "]#COSName: " + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println("PDResources[" + pageIndex + "]#PDXObject: " + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    BufferedImage bufferedImage = compressor.compress(img.getImage(), img.getSuffix(), 0.8, 0.5);
                    PDImageXObject imgNew = null;
                    System.out.println("img(w, h): (" + img.getWidth() + "," + img.getHeight() + ")");
                    System.out.println("bufferedImage(w, h): (" + bufferedImage.getWidth() + "," + bufferedImage.getHeight() + ")");
                    if("png".equalsIgnoreCase(img.getSuffix())) {
                        imgNew = LosslessFactory.createFromImage(document, bufferedImage);
                    } else {
                        imgNew = JPEGFactory.createFromImage(document, bufferedImage);
                    }
                    pdResources.put(c, imgNew);
                }
            }
            pageIndex ++;
        }
        if(!output.getParentFile().exists()) {
            output.getParentFile().mkdirs();
        }
        document.save(output);
        document.close();
    }
}

通过使用以下方法直接处理文档中的对象,也许我们可以解决上述的问题。但是我不知道如何以这种方式替换流。

new com.itextpdf.text.pdf.PdfReader(new FileInputStream(pdf)).getPdfObject(i);
// or
org.apache.pdfbox.pdmodel.PDDocument.load(pdf).getDocument().getObjects()
英文:

Here is a workable but not good enough answer. It compresses jpg and png very well. The only shortcoming is that if you reuse an image in many pages, it will take each image ref as an individual stream and produces a new stream to take place of the image ref which might cause a larger file size.

1 0 obj
&lt;&lt;/Type/XObject/Subtype/Image/Width 1002/Height 564/Filter/DCTDecode/ColorSpace/DeviceRGB/BitsPerComponent 8/Length 89149&gt;&gt;stream
...
endstream
endobj
2 0 obj
&lt;&lt;/Length 106/Filter/FlateDecode&gt;&gt;stream
x�m�=� ��w�^@|���=�	7�/����8�6��&amp;b0$��
��N!o��L�,?Ck&#39;�����c�h�x0��/(5c*�Y�سEX�o�Uj3�B�ݔ&quot;
endstream
endobj
4 0 obj
&lt;&lt;/Type/Page/MediaBox[0 0 595 842]/Resources&lt;&lt;/XObject&lt;&lt;/img0 1 0 R&gt;&gt;&gt;&gt;/Contents 2 0 R/Parent 3 0 R&gt;&gt;
endobj
5 0 obj
&lt;&lt;/Length 106/Filter/FlateDecode&gt;&gt;stream
x�m�=� ��w�^@|���=�image	7�/����8�6��&amp;b0$��
��N!o��L�,?Ck&#39;�����c�h�x0��/(5c*�Y�سEX�o�Uj3�B�ݔ&quot;
endstream
endobj
6 0 obj
&lt;&lt;/Type/Page/MediaBox[0 0 595 842]/Resources&lt;&lt;/XObject&lt;&lt;/img0 1 0 R&gt;&gt;&gt;&gt;/Contents 5 0 R/Parent 3 0 R&gt;&gt;
endobj
package io.gitlab.donespeak.tutorial.pdf.reducesize;

import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.ThumbnailatorCompressor;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class RemoveAllImageFromPdf {

    public static void extractImages(File input, File imageDir) throws IOException {
        if(imageDir.exists()) {
            imageDir.delete();
        }
        imageDir.mkdirs();
        PDDocument document = PDDocument.load(input);
        int pageIndex = 1;
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        for (PDPage page : catalog.getPages()) {
            PDResources pdResources = page.getResources();
            System.out.println(pdResources.toString());
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println(&quot;PDResources[&quot; + pageIndex + &quot;]#COSName: &quot; + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println(&quot;PDResources[&quot; + pageIndex + &quot;]#PDXObject: &quot; + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    System.out.println(img.getSuffix() + &quot;-&quot; + img.getBitsPerComponent() + &quot;-&quot; + img.getColorSpace());
                    File file = new File(imageDir, pageIndex + &quot;-&quot; + c.getName() + &quot;-&quot; + img.getColorSpace() + &quot;-&quot; + System.nanoTime() + &quot;.&quot; + img.getSuffix());
                    ImageIO.write(((PDImageXObject)o).getImage(), img.getSuffix(), file);
                }
            }
            pageIndex ++;
        }
        // document.save(output);
    }

    /**
     *
     * @param input
     * @param output
     * @throws IOException
     */
    public static void compress(File input, File output) throws IOException {
        if(!output.getParentFile().exists()) {
            output.getParentFile().mkdirs();
        }
        ThumbnailatorCompressor compressor = new ThumbnailatorCompressor();
        PDDocument document = PDDocument.load(input);
        int pageIndex = 1;
        PDDocumentCatalog catalog = document.getDocumentCatalog();

        for (PDPage page : catalog.getPages()) {
            PDResources pdResources = page.getResources();
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println(&quot;PDResources[&quot; + pageIndex + &quot;]#COSName: &quot; + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println(&quot;PDResources[&quot; + pageIndex + &quot;]#PDXObject: &quot; + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    BufferedImage bufferedImage = compressor.compress(img.getImage(), img.getSuffix(), 0.8, 0.5);
                    PDImageXObject imgNew = null;
                    System.out.println(&quot;img(w, h): (&quot; + img.getWidth() + &quot;,&quot; + img.getHeight() + &quot;)&quot;);
                    System.out.println(&quot;bufferedImage(w, h): (&quot; + bufferedImage.getWidth() + &quot;,&quot; + bufferedImage.getHeight() + &quot;)&quot;);
                    if(&quot;png&quot;.equalsIgnoreCase(img.getSuffix())) {
                        imgNew = LosslessFactory.createFromImage(document, bufferedImage);
                    } else {
                        imgNew = JPEGFactory.createFromImage(document, bufferedImage);
                    }
                    pdResources.put(c, imgNew);
                }
            }
            pageIndex ++;
        }
        if(!output.getParentFile().exists()) {
            output.getParentFile().mkdirs();
        }
        document.save(output);
        document.close();
    }
}

By using the following methods to process objects in the document directly, maybe we can solve the problem above. But I have no idea how to replace the stream in this way.

new com.itextpdf.text.pdf.PdfReader(new FileInputStream(pdf)).getPdfObject(i);
// or
org.apache.pdfbox.pdmodel.PDDocument.load(pdf).getDocument().getObjects()

huangapple
  • 本文由 发表于 2020年5月4日 18:34:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/61590198.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定