英文:
itext2 to itext7: Merging PDF files
问题
多年前,我使用 itext2
编写了一个小应用程序,用于每周收集报告并将它们合并成一个PDF。该应用程序使用 com.lowagie.text.pdf.PdfCopy
来复制和合并PDF文件,而且它运行得很好,按照预期的方式执行。
几周前,我考虑将这个应用程序迁移到 itex7
。为此,我使用了 com.itextpdf.kernel.pdf.PdfDocument
的 copyPagesTo
方法。但当在相同的文件集上运行时,会产生警告信息,类似于:
WARN PdfNameTree - Name "section.1" already exists in the name tree; old value will be replaced by the new one.
当我在合并后的PDF的第一个文档中点击链接到 "section.1" 时,我被带到最后一个文档的 "section.1"。这不是我预期的结果,也不是在使用 itext2
应用程序时发生的情况。在由 itext2
生成的PDF中,如果我点击链接到合并PDF中第一个文档的 "section.1",我会跳转到第一个文档的 "section 1"。
在 copyPagesTo
的Javadocs 中有一条提示:
如果不同文档中的大纲目标名称相同,所有这些大纲都将导致结果文档中的一个单一位置。在这种情况下,iText 将记录一个警告。这可以通过在源文档中重命名目标名称来避免。
然而,没有解释应该如何执行此操作。我觉得奇怪的是,在 itext7
中竟然需要这样做,尽管在 itext2
中不需要。
是否有简单的方法来解决这个问题?
我还尝试了Sejda桌面应用程序,它生成了正确的结果,但我更愿意通过批处理脚本自动化这个过程。
英文:
Years ago, I wrote a small app in itext2
to gather reports on a weekly basis and concatenate them into one PDF. The app used com.lowagie.text.pdf.PdfCopy
to copy and merge the PDFs. And it worked fine. Performed exactly as expected.
A few weeks ago I looked into migrating the application to itex7
. To that end, I used the copyPagesTo
method of com.itextpdf.kernel.pdf.PdfDocument
. When run on the same file set, this produces warnings like:
WARN PdfNameTree - Name "section.1" already exists in the name tree; old value will be replaced by the new one.
When I click on the link to "section.1" in the first document of the merged PDF, I am taken to "section.1" of the last document. Not what I expected and not what happens when using the itext2
app. In the PDF's produced by itext2
, if I click on the link to "section.1" of the first document in the combined PDF, I am taken to section 1 of the first document.
There is a hint in Javadocs for copyPagesTo
saying
> If outlines destination names are the same in different documents, all
> such outlines will lead to a single location in the resultant
> document. In this case iText will log a warning. This can be avoided
> by renaming destinations names in the source document.
There is however, no explanation of how this should be done. I find it odd that this should be necessary in itext7
, although it wasn't in itext2
.
Is there a simple way to get around his problem?
I've also tried the Sejda desktop app and it produces correct results, but I would prefer to automate the process through a batch script.
答案1
得分: 1
我的猜测是 iText 2 可能根本不知道这可能是个问题。
如果 iText 不能去重目标名称,那么大致的过程如下:
跟随 /Catalog -> /Names -> /Dests 在每个文档中查找目标名称树。
通过添加后缀去重名称。请记住,添加后缀的名称可能与同一文档或另一文档中的现有名称相等。要小心!
现在,您可以重写目标名称树。由于只使用了后缀,您可以就地执行此操作 - 名称的字典顺序不会改变,因此搜索树结构不会被破坏。
现在,在每个 PDF 中为新名称重写目标链接。例如,任何具有关键字 /Dest 的字典条目,或任何 /GoTo 操作中的 /D。
现在,在进行了所有这些预处理之后,文件将合并而不会出现名称冲突。
(我知道所有这些,因为我刚刚为我的自己的 PDF 软件实现了它。这是稍微复杂的事情,但不是无法解决的。)
如果您愿意,我可以提供一个带有此功能的 cpdf
开发版本,供您测试。
英文:
My guess is iText 2 didn't even know it might be a problem.
If iText can't deduplicate destination names, the procedure is roughly:
Follow /Catalog -> /Names -> /Dests in each document to find the destination name tree.
Deduplicate the names, by adding suffixes. Remember that a name with a suffix added might be equal to an existing name in the same or another document. Be careful!
Now you can rewrite the destination name trees. Since you have only used suffixes, you can do this in place - the lexicographic ordering of the names is unaltered so the search tree structure is not broken.
Now, rewrite destination links in each PDF for the new names. For example any dictionary entry with key /Dest, or any /D in a /GoTo action.
Now, after all this preprocessing, the files will merge without name clashes.
(I know all this because I've just implemented it for my own PDF software. It's slightly hairy stuff, but not intractable.)
If you like, I can provide a devel version of cpdf
with this functionality, if you would like to test it.
答案2
得分: 0
我明白你的要求,以下是翻译好的内容:
在回复 @johnwhitington 的建议后,我成功执行了以下步骤。
正如John所建议的那样,首先重命名名称树中的目标:
Map<String, PdfString> renameDestinations(PdfDocument pdf, String suffix) {
Map<String, PdfString> renamed = new HashMap<>();
PdfNameTree nameTree = pdf.getCatalog().getNameTree(PdfName.Dests);
Map<String, PdfObject> names = nameTree.getNames();
Set<String> ks = new HashSet<String>(names.keySet());
for (String key : ks) {
String oldName = key;
PdfString newName = new PdfString(oldName + suffix);
PdfObject value = names.get(key);
names.remove(key);
names.put(newName.toUnicodeString(), value);
renamed put(oldName, newName);
}
nameTree.setModified();
return renamed;
}
然后更改大纲中的目标为新目标:
void renameOutlines(PdfOutline pdo, Map<String, PdfString> old2new) {
for (PdfOutline child : pdo.getAllChildren()) {
renameOutlines(child, old2new);
}
PdfDestination dest = pdo.getDestination();
if (!((dest instanceof PdfNamedDestination)
|| (dest instanceof PdfStringDestination))) {
return;
}
String oldDest = "";
if (pdo.getDestination().getPdfObject().isString()) {
oldDest = ((PdfString) pdo.getDestination()
.getPdfObject()).toUnicodeString();
} else if (pdo.getDestination().getPdfObject().isName()) {
oldDest = ((PdfName) pdo.getDestination()
.getPdfObject()).getValue();
}
if (oldDest != null && old2new.containsKey(oldDest)) {
pdo.addDestination(new PdfStringDestination(
old2new.get(oldDest)));
}
}
最后,更改每一页上的链接以使用新目标:
void renameLinks(PdfDocument pdf, Map<String, PdfString> renamed) {
int numPages = pdf.getNumberOfPages();
for (int p = 1; p <= numPages; p++) {
List<PdfAnnotation> annots = pdf.getPage(p).getAnnotations();
for (PdfAnnotation annot : annots) {
PdfObject dest = annot.getPdfObject().get(PdfName.Dest);
if (dest != null && dest.isName()) {
String oldString = ((PdfName) dest).getValue();
if (renamed.containsKey(oldString)) {
annot.getPdfObject().put(PdfName.Dest,
renamed.get(oldString));
}
}
}
}
}
当然,后缀应该对于每个合并的文件是唯一的。
英文:
Following up on the reply from @johnwhitington I got the following procedure to work.
As John suggests, start by renaming the destinations in the name tree:
Map<String, PdfString> renameDestinations(PdfDocument pdf,
String suffix) {
Map<String, PdfString> renamed = new HashMap<>();
PdfNameTree nameTree = pdf.getCatalog().getNameTree(PdfName.Dests);
Map<String, PdfObject> names = nameTree.getNames();
Set<String> ks = new HashSet<String> (names.keySet());
for (String key : ks) {
String oldName = key;
PdfString newName = new PdfString(oldName + suffix);
PdfObject value = names.get(key);
names.remove(key);
names.put(newName.toUnicodeString(), value);
renamed.put(oldName, newName);
}
nameTree.setModified();
return renamed;
}
Then change the destinations in the outlines to the new destinations:
void renameOutlines(PdfOutline pdo, Map<String, PdfString> old2new) {
for (PdfOutline child : pdo.getAllChildren()) {
renameOutlines(child, old2new);
}
PdfDestination dest = pdo.getDestination();
if (!((dest instanceof PdfNamedDestination)
|| (dest instanceof PdfStringDestination))) {
return;
}
String oldDest = "";
if (pdo.getDestination().getPdfObject().isString()) {
oldDest = ((PdfString) pdo.getDestination()
.getPdfObject()).toUnicodeString();
} else if (pdo.getDestination().getPdfObject().isName()) {
oldDest = ((PdfName) pdo.getDestination()
.getPdfObject()).getValue();
}
if (oldDest != null && old2new.containsKey(oldDest)) {
pdo.addDestination(new PdfStringDestination(
old2new.get(oldDest)));
}
}
Finally, change the links on each page to use the new destinations:
void renameLinks(PdfDocument pdf, Map<String, PdfString> renamed) {
int numPages = pdf.getNumberOfPages();
for (int p = 1; p <= numPages; p++) {
List<PdfAnnotation> annots = pdf.getPage(p).getAnnotations();
for (PdfAnnotation annot : annots) {
PdfObject dest = annot.getPdfObject().get(PdfName.Dest);
if (dest != null && dest.isName()) {
String oldString = ((PdfName) dest).getValue();
if (renamed.containsKey(oldString)) {
annot.getPdfObject().put(PdfName.Dest,
renamed.get(oldString));
}
}
}
}
}
The suffix of course should be unique to each file being merged.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论