When opening pdf file in Adobe reader, warning showing "An error exists on this page. Acrobat may not display the page correctly"

huangapple go评论72阅读模式
英文:

When opening pdf file in Adobe reader, warning showing "An error exists on this page. Acrobat may not display the page correctly"

问题

目前使用pdfbox 2.x库来在从文件系统加载pdf文件后移除和添加QR码图像。在从pdf文件中移除QR码并保存并在Adobe Reader中打开修改后的文档后,会弹出上述警告消息:“此页面存在错误。Acrobat 可能无法正确显示页面”。QR码图像成功移除,但在打开后会显示警告消息。

此外,在从pdf文件中移除QR码图像之前,文件大小为6.8 MB。在读取QR码后,文件大小增加到8.1 MB。

在没有QR码图像的情况下打开修改后的文档时,不应显示警告消息:“此页面存在错误。Acrobat 可能无法正确显示页面”。对于原始文件,不会显示警告消息。

此外,预期在移除QR码图像后,文件大小不应增加,而应该减小或保持不变。

请问您能提供帮助吗?

以下是从pdf文件中移除QR码图像的代码:

pdDocument = PDDocument.load(new File(aBarcodeVO.getSourceFilePath()));
newDocument = new PDDocument();
for (int pageCount = 0; pageCount < pdDocument.getNumberOfPages(); pageCount++) {
    PDPage pdPage = newDocument.importPage(pdDocument.getPage(pageCount));
    String imgUniqueId = aBarcodeVO.getImgUniqueId().concat(String.valueOf(pageCount));
    boolean hasQRCodeOnPage = removeQRCodeImage(newDocument, pdPage, imgUniqueId);
    qRCodePageList.add(hasQRCodeOnPage);
}
if (qRCodePageList.contains(true)) {
    newDocument.save(aBarcodeVO.getDestinationFilePath(true));
}
newDocument.close();
pdDocument.close();
public static boolean removeQRCodeImage(PDDocument document, PDPage page, String imgUniqueId) throws Exception {
    String qrCodeCosName = null;
    PDResources pdResources = page.getResources();
    boolean hasQRCodeOnPage = false;
    for (COSName propertyName : pdResources.getXObjectNames()) {

        if (!pdResources.isImageXObject(propertyName)) {
            continue;
        }
        PDXObject o;
        try {
            o = pdResources.getXObject(propertyName);
            if (o instanceof PDImageXObject) {
                PDImageXObject pdImageXObject = (PDImageXObject) o;
                if (pdImageXObject.getMetadata() != null) {        
                    DomXmpParser xmpParser = new DomXmpParser();
                    XMPMetadata xmpMetadata = xmpParser.parse(pdImageXObject.getMetadata().toByteArray());
                    
                    if (xmpMetadata.getDublinCoreSchema() != null &&
                        StringUtils.isNoneBlank(xmpMetadata.getDublinCoreSchema().getTitle()) &&
                        xmpMetadata.getDublinCoreSchema().getTitle().contains("_barcodeimg_")) {
                        ((COSDictionary) pdResources.getCOSObject().getDictionaryObject(COSName.XOBJECT))
                            .removeItem(propertyName);
                        log.debug("propertyName REMOVED--" + propertyName.getName());
                        qrCodeCosName = propertyName.getName();
                        hasQRCodeOnPage = true;
                    }
                }
            }
        } catch (IOException e) {
            log.error("Exception in removeQRCodeImage() while extracting QR image: " + e, e);
        }
    }

    PDFStreamParser parser = new PDFStreamParser(page);
    parser.parse();
    List<Object> tokens = parser.getTokens();
    log.debug("original tokens size" + tokens.size());
    List<Object> newTokens = new ArrayList<Object>();

    for (int j = 0; j < tokens.size(); j++) {
        Object token = tokens.get(j);
        if (token instanceof Operator) {
            Operator op = (Operator) token;
            // find image - remove it
            if (op.getName().equals("Do")) {
                COSName cosName = (COSName) tokens.get(j - 1);
                if (cosName.getName().equals(qrCodeCosName)) {
                    newTokens.remove(newTokens.size() - 1);
                    continue;
                }
            }
            newTokens.add(token);
        }
    }
    log.debug("tokens size" + newTokens.size());
    PDStream newContents = new PDStream(document);
    OutputStream out = newContents.createOutputStream();
    ContentStreamWriter writer = new ContentStreamWriter(out);
    writer.writeTokens(newTokens);
    out.close();
    page.setContents(newContents);
    return hasQRCodeOnPage;
}
英文:

Currently using pdfbox 2.x library for removing and adding the QR code image after loading the pdf file from the file system. After removing the QR code from the pdf file and saving and opening the modified document in Adobe Reader, it populates the above warning message "An error exists on this page. Acrobat may not display the page correctly". The QR code image is removed successfully but shows the warning message after opening.

Also, Before removing the QR code image from the pdf file, the file size was 6.8 MB. After reading the QR code, the file size increases to 8.1 MB.

It should not show the warning message: "An error exists on this page. Acrobat may not display the page correctly" when opening the modified document without QR code image. For the original file, there is no warning message showing.

Also, it was expected that after removing the QR code image, the file size should not increase, rather it should decrease or remain the same.

Can you please help?

Below is the code for removing qr code image from the pdf file.

pdDocument = PDDocument.load(new File(aBarcodeVO.getSourceFilePath()));
newDocument = new PDDocument();
for (int pageCount = 0; pageCount &lt; pdDocument.getNumberOfPages(); pageCount++) {
PDPage pdPage = newDocument.importPage(pdDocument.getPage(pageCount));
String imgUniqueId = aBarcodeVO.getImgUniqueId().concat(String.valueOf(pageCount));
boolean hasQRCodeOnPage = removeQRCodeImage(newDocument, pdPage, imgUniqueId);
qRCodePageList.add(hasQRCodeOnPage);
}
if(qRCodePageList.contains(true)) {
newDocument.save(aBarcodeVO.getDestinationFilePath(true));
}
newDocument.close();
pdDocument.close();
public static boolean removeQRCodeImage(PDDocument document, PDPage page, String imgUniqueId) throws Exception {
String qrCodeCosName = null;
PDResources pdResources = page.getResources();
boolean hasQRCodeOnPage=false;
for (COSName propertyName : pdResources.getXObjectNames()) {
if (!pdResources.isImageXObject(propertyName)) {
continue; 
}
PDXObject o;
try {
o = pdResources.getXObject(propertyName);
if (o instanceof PDImageXObject) {
PDImageXObject pdImageXObject = (PDImageXObject) o;
if (pdImageXObject.getMetadata() != null) {		
DomXmpParser xmpParser = new DomXmpParser();
XMPMetadata xmpMetadata = xmpParser.parse(pdImageXObject.getMetadata().toByteArray());
if(xmpMetadata.getDublinCoreSchema()!=null &amp;&amp; StringUtils.isNoneBlank(xmpMetadata.getDublinCoreSchema().getTitle())&amp;&amp;xmpMetadata.getDublinCoreSchema().getTitle().contains(&quot;_barcodeimg_&quot;)) {
((COSDictionary) pdResources.getCOSObject().getDictionaryObject(COSName.XOBJECT))
.removeItem(propertyName);
log.debug(&quot;propertyName REMOVED--&quot;+propertyName.getName());
qrCodeCosName = propertyName.getName();
hasQRCodeOnPage=true;
}
}
}
} catch (IOException e) {
log.error(&quot;Exception in removeQRCodeImage() while extracting QR image:&quot; + e, e);
}
}
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List&lt;Object&gt; tokens = parser.getTokens();
log.debug(&quot;original tokens size&quot; + tokens.size());
List&lt;Object&gt; newTokens = new ArrayList&lt;Object&gt;();
for (int j = 0; j &lt; tokens.size(); j++) {
Object token = tokens.get(j);
if (token instanceof Operator) {
Operator op = (Operator) token;
// find image - remove it
if (op.getName().equals(&quot;Do&quot;)) {
COSName cosName = (COSName) tokens.get(j - 1);
if (cosName.getName().equals(qrCodeCosName)) {
newTokens.remove(newTokens.size() - 1);
continue;
}
}
}
newTokens.add(token);
}
log.debug(&quot;tokens size&quot; + newTokens.size());
PDStream newContents = new PDStream(document);
OutputStream out = newContents.createOutputStream();
ContentStreamWriter writer = new ContentStreamWriter(out);
writer.writeTokens(newTokens);
out.close();
page.setContents(newContents);
return hasQRCodeOnPage;
}

答案1

得分: 0

可能的错误:PDF 资源可以在多个页面之间共享,甚至可以为多个页面使用相同的 资源 对象。因此,如果您的文档属于这种类型,那么您对页面资源的操作实际上可能会影响所有页面的资源,而内容流的操作只会更改单个页面。因此,在其他页面上使用相同的图像可能会导致您观察到的错误消息。

另一个可能的错误:在遍历页面的资源时,您移除了 所有 匹配的图像 XObject。但在遍历页面的指令时,您只移除了找到的最后一个匹配的图像 XObject 的显示指令。如果页面上有多个匹配的图像 XObject,那么一些图像的显示指令可能仍然存在,而图像本身已被移除;这也可能导致观察到的错误消息。

还可能存在其他问题。要进行更具体的分析,请分享一个代表性的示例 PDF。

英文:

A possible error: PDF resources can be shared across pages, even the same Resources object may be used for multiple pages. If your document is of such a type, therefore, your manipulation of the resources of a page may actually manipulate the resources of all pages while your content stream manipulation changes only a single page. Uses of the same image on other pages, therefore, could cause the error message you observed.

Another possible error: While iterating over the resources of the page, you remove all matching image Xobjects. But while iterating over the instructions of the page, you only remove the showing instructions for one matching image Xobject, the last one found. If there are multiple matching image Xobjects on a page, showing instructions for some of them may remain while the Xobjects themselves are removed; this could also cause the error message observed.

There might also be other issues. For a more specific analysis please share a representative example PDF.

huangapple
  • 本文由 发表于 2023年2月6日 20:07:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75361114.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定