PDFBox renderImageWithDPI produces images with missing content due to absent embedded fonts – how do I resolve this?

huangapple go评论88阅读模式
英文:

PDFBox renderImageWithDPI produces images with missing content due to absent embedded fonts - how do I resolve this?

问题

Question 1: 如何添加字体?

  • 如果我尝试使用下面的代码块,在渲染之前获取页面并添加字体没有任何影响。
  • 请注意,getDocument()setDocumentsetPdfRenderer 是我实现类中的便捷方法。setPdfRenderer() 包含 PDFRenderer renderer = new PDFRenderer(document); 并将其设置为类变量。
int position = 0;
PDPage page = getDocument().getPage(position);
PDResources resources = page.getResources();
OTFParser otfParser = new OTFParser();
OpenTypeFont otf = otfParser.parse(new File("OptimaLTStd.otf"));
PDFont font = PDType0Font.load(document, otf, false);

resources.add(font);
page.setResources(resources);
if (position == 0) {
   getDocument().getPages().remove(page);
   getDocument().getPages().add(page);
   setDocument(getDocument());
   setPdfRenderer(getDocument());
} else {
   PDPage prevPage = getDocument().getPage(position - 1);
   getDocument().getPages().insertBefore(page, prevPage);
   setDocument(getDocument());
   setPdfRenderer(getDocument());
}
  • 链接下载了OTF字体文件。

Question 2: 在pdfrender中是否有覆盖方法,以跳过字形处理,以便字体相关的问题不会影响图像生成?

英文:

PDFBox renderImageWithDPI only partially renders text because of missing embedded(?) fonts.

  • Using PDFBox 2.0.28 then tried PDFBox 3.0.0-RC1

  • Created a PDDocument using Loader.loadPDF

  • Created a PDFRenderer from the PDDocument

  • Executed renderImageWithDPI(pagenum, dpi, RGBObj) on PDDocument

  • Obtained java.awt.image.BufferedImage

  • Write as jpg using javax.imageio.ImageIO

  • However, there is missing content in the images

  • Extracted 2 sample problematic pages from the pdf using PDFSam basic

  • Pg 1 which generates image 1

  • and Pg 2 which generated image 2

  • Have highlighted areas where the content is missing.

  • On executing PreflightParser.validate obtain the messages below:-

1.4 : Trailer Syntax error, /XRef cross reference streams are not allowed
5.2.2 : Forbidden field in an annotation definition, Flags of Link annotation are invalid
2.3.2 : Unexpected value for key in Graphic object definition, Unexpected 'true' value for 'Interpolate' Key
2.4.2 : Invalid Color space, The operator "k" can't be used with RGB Profile
2.4.3 : Invalid Color space, The operator "f" can't be used without Color Profile
3.1.4 : Invalid Font definition, ELWKFI+OptimaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, JECWGC+InsigniaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, PHSMMZ+OptimaLTStd-Bold: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, EHCNNL+OptimaLTStd-Italic: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, QBVSKF+HelveticaLTStd-Obl: The Charset entry is missing for the Type1 Subset
3.1.9 : Invalid Font definition, UBAPGG+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, UBAPGG+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, UBAPGG+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, ORMCFE+HelveticaLTStd-Obl: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, ORMCFE+HelveticaLTStd-Obl: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, ORMCFE+HelveticaLTStd-Obl: The FontFile can't be read
3.1.9 : Invalid Font definition, TFEWKU+HelveticaLTStd-Roman: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, TFEWKU+HelveticaLTStd-Roman: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, TFEWKU+HelveticaLTStd-Roman: The FontFile can't be read
3.1.4 : Invalid Font definition, CRQQXS+OptimaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, MVVAWX+InsigniaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, YIWFBD+OptimaLTStd-Bold: The Charset entry is missing for the Type1 Subset
3.1.11 : Invalid Font definition, JYHLHF+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.1.9 : Invalid Font definition, LDXBBC+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, LDXBBC+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, LDXBBC+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, FSNSYC+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, FSNSYC+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, FSNSYC+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, LVYKUL+InsigniaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, LVYKUL+InsigniaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, LVYKUL+InsigniaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, FUYTUP+OptimaLTStd-Italic: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, FUYTUP+OptimaLTStd-Italic: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, FUYTUP+OptimaLTStd-Italic: The FontFile can't be read
3.1.9 : Invalid Font definition, GZVYQO+OptimaLTStd-Bold: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, GZVYQO+OptimaLTStd-Bold: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, GZVYQO+OptimaLTStd-Bold: The FontFile can't be read
3.1.9 : Invalid Font definition, GWNIWZ+HelveticaLTStd-Roman: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, GWNIWZ+HelveticaLTStd-Roman: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, GWNIWZ+HelveticaLTStd-Roman: The FontFile can't be read
7.1 : Error on MetaData, Metadata is not a stream

Which also corroborate to execution warnings

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font GWNIWZ+HelveticaLTStd-Roman
java.io.IOException: head is mandatory
	at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:182)
	at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:150)
	at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)
	at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)
	at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:106)
	at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)
	at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:114)
	at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:67)
	at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:138)
	at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:88)
	at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:96)
	at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
	at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:849)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:142)
	at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:264)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:338)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:259)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:245)

Additional truncated messages

May 26, 2023 12:40:00 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font UBAPGG+OptimaLTStd
java.io.IOException: head is mandatory

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font GZVYQO+OptimaLTStd-Bold
java.io.IOException: head is mandatory

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font FUYTUP+OptimaLTStd-Italic
java.io.IOException: head is mandatory

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font FSNSYC+OptimaLTStd
java.io.IOException: head is mandatory

Although fallback fonts seen to be used they don't work either.

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 findFontOrSubstitute WARNING: Using fallback font LiberationSans for CID-keyed TrueType font GWNIWZ+HelveticaLTStd-Roman

I also see warning messages as below, unsure how to process / address.

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.graphics.color.PDICCBased ensureDisplayProfile WARNING: ICC profile is Perceptual, ignoring, treating as Display class

Need multiple assistance.

Question 1: How do I add a font?

  • If I try using the below, The codeblock below where I get a page and add a font before rendering doesnt have any impact.
  • Note, getDocument() and setDocument and setPdfRenderer are convenience methods in my implementation class. setPdfRenderer() contains PDFRenderer renderer = new PDFRenderer(document); and sets it to a class variable.
int position = 0;
PDPage page = getDocument().getPage(position);
PDResources resources = page.getResources();
OTFParser otfParser = new OTFParser();
OpenTypeFont otf = otfParser.parse(new File("OptimaLTStd.otf"));
PDFont font = PDType0Font.load(document, otf, false);

resources.add(font);
page.setResources(resources);
if (position == 0) {
   getDocument().getPages().remove(page);
   getDocument().getPages().add(page);
   setDocument(getDocument());
   setPdfRenderer(getDocument());
} else {
   PDPage prevPage = getDocument().getPage(position - 1);
   getDocument().getPages().insertBefore(page, prevPage);
   setDocument(getDocument());
   setPdfRenderer(getDocument());			}
  • Downloaded OTF from link

Question 2: Do we have an override in pdfrender to skip glyph processing so that font related issues do not impact image generation ?

答案1

得分: 0

缺失文本的问题是由于PDF中字体的宽度定义为0而引起的,这会不正确地影响渲染时的“拉伸”算法。这已经在票据PDFBOX-5611中修复,并将在版本2.0.29中发布。在那之前,可以使用快照构建

英文:

The problem of the missing text is caused by 0 width definitions for the fonts in the PDF, which incorrectly influences a "stretching" algorithm hen rendering. This has been fixed in ticket PDFBOX-5611 and will be in the version 2.0.29. Until then, a snapshot build will be available.

huangapple
  • 本文由 发表于 2023年5月26日 16:39:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76339096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定