2023年2月6日 20:07:37go评论107阅读模式

英文:

When opening pdf file in Adobe reader, warning showing "An error exists on this page. Acrobat may not display the page correctly"

问题

目前使用pdfbox 2.x库来在从文件系统加载pdf文件后移除和添加QR码图像。在从pdf文件中移除QR码并保存并在Adobe Reader中打开修改后的文档后，会弹出上述警告消息：“此页面存在错误。Acrobat 可能无法正确显示页面”。QR码图像成功移除，但在打开后会显示警告消息。

此外，在从pdf文件中移除QR码图像之前，文件大小为6.8 MB。在读取QR码后，文件大小增加到8.1 MB。

在没有QR码图像的情况下打开修改后的文档时，不应显示警告消息：“此页面存在错误。Acrobat 可能无法正确显示页面”。对于原始文件，不会显示警告消息。

此外，预期在移除QR码图像后，文件大小不应增加，而应该减小或保持不变。

请问您能提供帮助吗？

以下是从pdf文件中移除QR码图像的代码：

pdDocument = PDDocument.load(new File(aBarcodeVO.getSourceFilePath()));
newDocument = new PDDocument();
for (int pageCount = 0; pageCount < pdDocument.getNumberOfPages(); pageCount++) {
    PDPage pdPage = newDocument.importPage(pdDocument.getPage(pageCount));
    String imgUniqueId = aBarcodeVO.getImgUniqueId().concat(String.valueOf(pageCount));
    boolean hasQRCodeOnPage = removeQRCodeImage(newDocument, pdPage, imgUniqueId);
    qRCodePageList.add(hasQRCodeOnPage);
}
if (qRCodePageList.contains(true)) {
    newDocument.save(aBarcodeVO.getDestinationFilePath(true));
}
newDocument.close();
pdDocument.close();

public static boolean removeQRCodeImage(PDDocument document, PDPage page, String imgUniqueId) throws Exception {
    String qrCodeCosName = null;
    PDResources pdResources = page.getResources();
    boolean hasQRCodeOnPage = false;
    for (COSName propertyName : pdResources.getXObjectNames()) {
        if (!pdResources.isImageXObject(propertyName)) {
            continue;
        }
        PDXObject o;
        try {
            o = pdResources.getXObject(propertyName);
            if (o instanceof PDImageXObject) {
                PDImageXObject pdImageXObject = (PDImageXObject) o;
                if (pdImageXObject.getMetadata() != null) {        
                    DomXmpParser xmpParser = new DomXmpParser();
                    XMPMetadata xmpMetadata = xmpParser.parse(pdImageXObject.getMetadata().toByteArray());
                    
                    if (xmpMetadata.getDublinCoreSchema() != null &&
                        StringUtils.isNoneBlank(xmpMetadata.getDublinCoreSchema().getTitle()) &&
                        xmpMetadata.getDublinCoreSchema().getTitle().contains("_barcodeimg_")) {
                        ((COSDictionary) pdResources.getCOSObject().getDictionaryObject(COSName.XOBJECT))
                            .removeItem(propertyName);
                        log.debug("propertyName REMOVED--" + propertyName.getName());
                        qrCodeCosName = propertyName.getName();
                        hasQRCodeOnPage = true;
                    }
                }
            }
        } catch (IOException e) {
            log.error("Exception in removeQRCodeImage() while extracting QR image: " + e, e);
        }
    }
    PDFStreamParser parser = new PDFStreamParser(page);
    parser.parse();
    List<Object> tokens = parser.getTokens();
    log.debug("original tokens size" + tokens.size());
    List<Object> newTokens = new ArrayList<Object>();
    for (int j = 0; j < tokens.size(); j++) {
        Object token = tokens.get(j);
        if (token instanceof Operator) {
            Operator op = (Operator) token;
            // find image - remove it
            if (op.getName().equals("Do")) {
                COSName cosName = (COSName) tokens.get(j - 1);
                if (cosName.getName().equals(qrCodeCosName)) {
                    newTokens.remove(newTokens.size() - 1);
                    continue;
                }
            }
            newTokens.add(token);
        }
    }
    log.debug("tokens size" + newTokens.size());
    PDStream newContents = new PDStream(document);
    OutputStream out = newContents.createOutputStream();
    ContentStreamWriter writer = new ContentStreamWriter(out);
    writer.writeTokens(newTokens);
    out.close();
    page.setContents(newContents);
    return hasQRCodeOnPage;
}

英文:

Currently using pdfbox 2.x library for removing and adding the QR code image after loading the pdf file from the file system. After removing the QR code from the pdf file and saving and opening the modified document in Adobe Reader, it populates the above warning message "An error exists on this page. Acrobat may not display the page correctly". The QR code image is removed successfully but shows the warning message after opening.

Also, Before removing the QR code image from the pdf file, the file size was 6.8 MB. After reading the QR code, the file size increases to 8.1 MB.

It should not show the warning message: "An error exists on this page. Acrobat may not display the page correctly" when opening the modified document without QR code image. For the original file, there is no warning message showing.

Also, it was expected that after removing the QR code image, the file size should not increase, rather it should decrease or remain the same.

Can you please help?

Below is the code for removing qr code image from the pdf file.

pdDocument = PDDocument.load(new File(aBarcodeVO.getSourceFilePath()));
newDocument = new PDDocument();
for (int pageCount = 0; pageCount &lt; pdDocument.getNumberOfPages(); pageCount++) {
PDPage pdPage = newDocument.importPage(pdDocument.getPage(pageCount));
String imgUniqueId = aBarcodeVO.getImgUniqueId().concat(String.valueOf(pageCount));
boolean hasQRCodeOnPage = removeQRCodeImage(newDocument, pdPage, imgUniqueId);
qRCodePageList.add(hasQRCodeOnPage);
}
if(qRCodePageList.contains(true)) {
newDocument.save(aBarcodeVO.getDestinationFilePath(true));
}
newDocument.close();
pdDocument.close();

public static boolean removeQRCodeImage(PDDocument document, PDPage page, String imgUniqueId) throws Exception {
String qrCodeCosName = null;
PDResources pdResources = page.getResources();
boolean hasQRCodeOnPage=false;
for (COSName propertyName : pdResources.getXObjectNames()) {
if (!pdResources.isImageXObject(propertyName)) {
continue; 
}
PDXObject o;
try {
o = pdResources.getXObject(propertyName);
if (o instanceof PDImageXObject) {
PDImageXObject pdImageXObject = (PDImageXObject) o;
if (pdImageXObject.getMetadata() != null) {		
DomXmpParser xmpParser = new DomXmpParser();
XMPMetadata xmpMetadata = xmpParser.parse(pdImageXObject.getMetadata().toByteArray());
if(xmpMetadata.getDublinCoreSchema()!=null &amp;&amp; StringUtils.isNoneBlank(xmpMetadata.getDublinCoreSchema().getTitle())&amp;&amp;xmpMetadata.getDublinCoreSchema().getTitle().contains(&quot;_barcodeimg_&quot;)) {
((COSDictionary) pdResources.getCOSObject().getDictionaryObject(COSName.XOBJECT))
.removeItem(propertyName);
log.debug(&quot;propertyName REMOVED--&quot;+propertyName.getName());
qrCodeCosName = propertyName.getName();
hasQRCodeOnPage=true;
}
}
}
} catch (IOException e) {
log.error(&quot;Exception in removeQRCodeImage() while extracting QR image:&quot; + e, e);
}
}
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List&lt;Object&gt; tokens = parser.getTokens();
log.debug(&quot;original tokens size&quot; + tokens.size());
List&lt;Object&gt; newTokens = new ArrayList&lt;Object&gt;();
for (int j = 0; j &lt; tokens.size(); j++) {
Object token = tokens.get(j);
if (token instanceof Operator) {
Operator op = (Operator) token;
// find image - remove it
if (op.getName().equals(&quot;Do&quot;)) {
COSName cosName = (COSName) tokens.get(j - 1);
if (cosName.getName().equals(qrCodeCosName)) {
newTokens.remove(newTokens.size() - 1);
continue;
}
}
}
newTokens.add(token);
}
log.debug(&quot;tokens size&quot; + newTokens.size());
PDStream newContents = new PDStream(document);
OutputStream out = newContents.createOutputStream();
ContentStreamWriter writer = new ContentStreamWriter(out);
writer.writeTokens(newTokens);
out.close();
page.setContents(newContents);
return hasQRCodeOnPage;
}

答案1

得分: 0

可能的错误：PDF 资源可以在多个页面之间共享，甚至可以为多个页面使用相同的资源对象。因此，如果您的文档属于这种类型，那么您对页面资源的操作实际上可能会影响所有页面的资源，而内容流的操作只会更改单个页面。因此，在其他页面上使用相同的图像可能会导致您观察到的错误消息。

另一个可能的错误：在遍历页面的资源时，您移除了所有匹配的图像 XObject。但在遍历页面的指令时，您只移除了找到的最后一个匹配的图像 XObject 的显示指令。如果页面上有多个匹配的图像 XObject，那么一些图像的显示指令可能仍然存在，而图像本身已被移除；这也可能导致观察到的错误消息。

还可能存在其他问题。要进行更具体的分析，请分享一个代表性的示例 PDF。

英文:

A possible error: PDF resources can be shared across pages, even the same Resources object may be used for multiple pages. If your document is of such a type, therefore, your manipulation of the resources of a page may actually manipulate the resources of all pages while your content stream manipulation changes only a single page. Uses of the same image on other pages, therefore, could cause the error message you observed.

Another possible error: While iterating over the resources of the page, you remove all matching image Xobjects. But while iterating over the instructions of the page, you only remove the showing instructions for one matching image Xobject, the last one found. If there are multiple matching image Xobjects on a page, showing instructions for some of them may remain while the Xobjects themselves are removed; this could also cause the error message observed.

There might also be other issues. For a more specific analysis please share a representative example PDF.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

When opening pdf file in Adobe reader, warning showing "An error exists on this page. Acrobat may not display the page correctly"

问题

答案1

Java多线程：选择太多

Resilience4J – 全局备用方法

`@SneakyThrows` 在 lombok 中的应用

Add 10 names to array list and print them in reverse order using lamda expression

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。