如何使用Java在PDF中插入图像。

huangapple go评论75阅读模式
英文:

How to insert image in pdf using Java

问题

我正在尝试使用Java创建PDF文件。我不想使用类似ITextPdf这样的库,而是想使用Java生成PDF的原始代码。我已经掌握了基本知识,能够创建一些PDF文件。我遇到困难的问题是在PDF中插入图像。据我了解,JPEG是最容易使用的图像格式,因为原始数据可以插入到PDF中。但是,当我尝试这样做时,却无法成功。我使用Java读取JPEG数据并将其插入到PDF中。

生成的PDF代码如下所示:

%PDF-1.7

1 0 obj
  <<
    /Type /Catalog
    /Pages 2 0 R
  >>
endobj

2 0 obj
  <<
    /Type /Pages
    /Count 1
    /Kids [3 0 R]
  >>
endobj

---------------------------------------------

3 0 obj
  <<
    /Type /Page
    /Parent 2 0 R
    /MediaBox [0 0 595 842]
    /Contents [5 0 R]
    /Resources
    <<
      /XObject
      <<
        /Im1 4 0 R
      >>
    >>
  >>
endobj


5 0 obj 
  <<
    /Length 35 
  >>
  stream 
    q 
    209 0 0 241 0 0 cm
    /Im1 Do 
    Q 
  endstream 
endobj

4 0 obj
  <<
    /Type /XObject
    /Filter /DCTDecode
    /Length 8000
    /ColorSpace /DeviceRGB
    /Height 241
    /Width 209
    /Subtype /Image
    /BitsPerComponent 8
  >>
    stream
      ***
    endstream
endobj

---------------------------------------------

trailer
  <<
    /Root 1 0 R
    /Size 5
  >>
%%EOF

标记为 "***" 的流数据是我使用以下方法获取的图像字节(这是由Java插入到文件中的):

java.nio.file.Files.readAllBytes(Paths.get(image.getPath())))

在查看PDF时,它不显示图像。我使用的图像可以在此处找到:https://rp-do-not-delete.s3.eu-central-1.amazonaws.com/download.jpeg。

我通过读取BufferedImage.getData()数据来获取图像的宽度、高度和字节长度,因此据我所知,它应该是正确的。

是否有人知道出了什么问题,或者我应该如何解决这个问题?可以随时询问其他信息,如我使用的Java代码等。

英文:

I am trying to create pdfs with Java. I do not want to use libraries like ITextPdf but want to generate the raw code of the pdf with Java. I got the basics down and am able to create some pdfs. The thing I am struggling with is inserting images in the pdf. As far as I understand JPEG's are the easiest ones to use since the raw data can be inserten in de the pdf. When I am trying this, it does not work however. I am reading the JPEG data with java ad inserting this in the pdf.

The generated PDF code looks like this:

%PDF-1.7

1 0 obj
  &lt;&lt;

    /Type /Catalog
    /Pages 2 0 R
  &gt;&gt;
endobj

2 0 obj
  &lt;&lt;

    /Type /Pages
    /Count 1
    /Kids [3 0 R]
  &gt;&gt;
endobj

---------------------------------------------

3 0 obj
  &lt;&lt;

    /Type /Page
    /Parent 2 0 R
    /MediaBox [0 0 595 842]
    /Contents [5 0 R]
    /Resources
    &lt;&lt;

    /XObject
      &lt;&lt;
        /Im1 4 0 R
      &gt;&gt;
    &gt;&gt;
  &gt;&gt;
endobj


5 0 obj 
  &lt;&lt;
    /Length 35 
  &gt;&gt; 
  stream 
    q 
    209 0 0 241 0 0 cm
    /Im1 Do 
    Q 
  endstream 
endobj

4 0 obj
  &lt;&lt;

    /Type /XObject
    /Filter /DCTDecode
    /Length 8000
    /ColorSpace /DeviceRGB
    /Height 241
    /Width 209
    /Subtype /Image
    /BitsPerComponent 8
  &gt;&gt;
    stream
      ***
    endstream
endobj

---------------------------------------------

trailer
  &lt;&lt;

    /Root 1 0 R
    /Size 5
  &gt;&gt;
%%EOF

the stream data marked as "***" are the image bytes I got by using (this is inserted in the file by java):

java.nio.file.Files.readAllBytes(Paths.get(image.getPath())))

When viewig the pdf it does not display the image. The image I used can be found here: https://rp-do-not-delete.s3.eu-central-1.amazonaws.com/download.jpeg.

I read the image width, height and byte length by reading the BufferedImage.getData() data, so, as far as I know, it should be correct.

Does anyone know what is wrong or what way I can approach this problem? Feel free to ask additional information like Java code I used etc.

答案1

得分: 2

你的代码一般都是PURR效果的。当代码和图像在MS记事本中正确添加并通过命令行添加图像时,结果如下。它缺少一个重要的特性(xref表在尾部之前),但Acrobat Reader会忽略它并写入自己的表。

在这种状态下,文件大小将为8,527字节(仅比原始图像多527字节),但Acrobat Reader将其保存为12,665字节,因为它会调整以更符合Web使用等要求。

因此,我们可以看到它完全不同(不一定都更好),所以现在开始:

要避免这种调整,您需要包含自己的xref表。以下是一个JScript的非常粗糙的示例:

var ByteStream = new ActiveXObject("ADODB.Stream");
ByteStream.Type = 2; // Writer
ByteStream.Charset = "Windows-1252"; //最适合PDF写入器
var BS = ByteStream; // 为了方便编辑而缩写
BS.Open();
BS.Position = 0;

BS.WriteText("%PDF-1.0\n");
BS.WriteText("%����\n");

BS.WriteText("1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj\n");
BS.WriteText("2 0 obj << /Type /Pages /Count 1 /Kids [3 0 R] >> endobj\n");
BS.WriteText("3 0 obj << /Type /Page /MediaBox [0 0 144 144] /Rotate 0 /Resources << /XObject << /Img0 4 0 R >> >> /Contents 5 0 R /Parent 2 0 R >> endobj\n");
BS.WriteText("4 0 obj << /Type /XObject /Subtype /Image /Height 25 /Width 24 /BitsPerComponent 1 /Length 75 /ColorSpace /DeviceGray >> stream\n");
BS.WriteText('���������m���������������[}oE[oE[oE[oEU��������Ms��������]9���N���E����L��������A[��������\n');
BS.WriteText("\nendstream\nendobj\n");
var Pos1 = "000000000"+BS.Position
BS.WriteText("5 0 obj << /Length 101 >> stream\n");
BS.WriteText("q\n1 0 0 -1 18 54 cm\n35 0 0 -36 0 36 cm\n/Img0 Do\nQ\nq\n1 0 0 -1 71 144 cm\n70 0 0 -72 0 72 cm\n/Img0 Do\nQ\n");
BS.WriteText("\nendstream\nendobj\n\n");
var Pos2 = BS.Position
BS.WriteText("xref\n0 6\n");
BS.WriteText("0000000000 00001 f \n0000000015 00000 n \n0000000060 00000 n \n0000000111 00000 n \n0000000237 00000 n \n"+Pos1.slice(-10)+" 00000 n \n");
BS.WriteText("\ntrailer\n<< /Size 6 /Info << /Producer (JScrip2pdf) >> /Root 1 0 R >>\nstartxref\n"+Pos2+"\n%%EOF\n");

BS.SaveToFile("HelloWorldB&W.pdf", 2);
BS.Close();

为了修复此类缺失的尾部,通常可以使用MuPDF-GL解决,但该软件仅适用于Windows,因此可以使用命令行MuTool进行跨平台清理。

MuTool也会将文件更改为更干净的8,653字节格式(因此几乎没有增加)。

英文:

Your code is generally PURR-effect. Here is the result when code and image are added together correctly in MS Notepad and image via command line. It is short of one essential feature (the xref table before the trailer) but Acrobat Reader will ignore that and write its own.

如何使用Java在PDF中插入图像。

Personally I tend to abbreviate the entries as it easier to see what is happening inside the PDF.

%PDF-1.7

1 0 obj &lt;&lt;/Type/Catalog/Pages 2 0 R&gt;&gt; endobj
2 0 obj &lt;&lt;/Type/Pages/Count 1/Kids [3 0 R]&gt;&gt; endobj
3 0 obj &lt;&lt;/Type/Page/Parent 2 0 R/MediaBox [0 0 595 842]/Contents [5 0 R]/Resources&lt;&lt;/XObject&lt;&lt;/Im1 4 0 R&gt;&gt;&gt;&gt;&gt;&gt; endobj
5 0 obj &lt;&lt;/Length 35&gt;&gt;
stream 
q 
209 0 0 241 0 0 cm
/Im1 Do 
Q 

endstream 
endobj

4 0 obj &lt;&lt;/Type/XObject/Filter/DCTDecode/Length 8000/ColorSpace/DeviceRGB/Height 241/Width 209/Subtype/Image/BitsPerComponent 8&gt;&gt;
stream
*** Image goes here
endstream
endobj

trailer &lt;&lt;/Root 1 0 R/Size 5&gt;&gt;
%%EOF

In that state it will be 8,527 bytes( so only 527 more than the raw image), However Acrobat READER will save it as 12,665 bytes because it adjusts to be more compliant with web usage etc.

Thus we see its totally different (not all necessarily better) so now starts

%PDF-1.7
%&#226;&#227;&#207;&#211;
7 0 obj
&lt;&lt;/Linearized 1/L 12665/O 9/E 9017/N 1/T 12376/H [ 435 131]&gt;&gt;
endobj
                     
12 0 obj
&lt;&lt;/DecodeParms&lt;&lt;/Columns 3/Predictor 12&gt;&gt;/Filter/FlateDecode/ID[&lt;84AE98C0D1CB3C4D9543B24450B2D64C&gt;&lt;84AE98C0D1CB3C4D9543B24450B2D64C&gt;]/Index[7 7]/Info 6 0 R/Length 36/Prev 12377/Root 8 0 R/Size 14/Type/XRef/W[1 2 0]&gt;&gt;stream
h&#222;bbd`b`Rcb`&#176;ab`œ
&#164;C™&#254;J &#217;&#246;  0&#196;
endstream
endobj
startxref
0
%%EOF
        
13 0 obj
&lt;&lt;/Filter/FlateDecode/I 66/Length 52/S 36&gt;&gt;stream
h&#222;b```f``|&#207; 
&#199;0ŠBx™&#236;&#160;&#162;&#250;&#251;4#&#207;0 &#161;&#185;,
endstream
endobj
8 0 obj
&lt;&lt;/Metadata 1 0 R/Pages 5 0 R/Type/Catalog&gt;&gt;

!!!

To avoid such adjustments you need to include a xref of your own. For a very rough example in JScript see below.

Note:- The copy and paste here will break the binary.

var ByteStream = new ActiveXObject(&quot;ADODB.Stream&quot;);
ByteStream.Type = 2; // Writer
ByteStream.Charset = &quot;Windows-1252&quot;; //Best for PDF writer
var BS = ByteStream; // Abreviate for ease of edit
BS.Open();
BS.Position = 0;

BS.WriteText(&quot;%PDF-1.0\n&quot;);
BS.WriteText(&quot;%&#197;&#209;&#167;&#161;\n&quot;);

BS.WriteText(&quot;1 0 obj &lt;&lt;/Type/Catalog/Pages 2 0 R&gt;&gt; endobj\n&quot;);
BS.WriteText(&quot;2 0 obj &lt;&lt;/Type/Pages/Count 1/Kids[3 0 R]&gt;&gt; endobj\n&quot;);
BS.WriteText(&quot;3 0 obj &lt;&lt;/Type/Page/MediaBox[0 0 144 144]/Rotate 0/Resources&lt;&lt;/XObject&lt;&lt;/Img0 4 0 R&gt;&gt;&gt;&gt;/Contents 5 0 R/Parent 2 0 R&gt;&gt; endobj\n&quot;);
BS.WriteText(&quot;4 0 obj &lt;&lt;/Type/XObject/Subtype/Image/Height 25/Width 24/BitsPerComponent 1/Length 75/ColorSpace/DeviceGray&gt;&gt; stream\n&quot;);
BS.WriteText(&#39;&#255;&#255;&#255;&#255;&#255;&#255;&#192;m&#223;[}&#209;oE&#209;[E&#209;qE&#223;E}&#192;U&#255;&#241;&#255;&#193;&#171;&#193;&#172;&#219;Zc&#253;&#214;&#199;&#200;&quot;}&#255;&#213;&#239;&#192;Ms&#223;`&#167;&#209;]9&#209;N&#209;E&#183;&#223;L&#199;&#192;A[&#255;&#255;&#255;&#255;&#255;&#255;&#39;);
BS.WriteText(&quot;\nendstream\nendobj\n&quot;);
var Pos1 = &quot;000000000&quot;+BS.Position
BS.WriteText(&quot;5 0 obj &lt;&lt;/Length 101&gt;&gt; stream\n&quot;);
BS.WriteText(&quot;q\n1 0 0 -1 18 54 cm\n35 0 0 -36 0 36 cm\n/Img0 Do\nQ\nq\n1 0 0 -1 71 144 cm\n70 0 0 -72 0 72 cm\n/Img0 Do\nQ\n&quot;);
BS.WriteText(&quot;\nendstream\nendobj\n\n&quot;);
var Pos2 = BS.Position
BS.WriteText(&quot;xref\n0 6\n&quot;);
BS.WriteText(&quot;0000000000 00001 f \n0000000015 00000 n \n0000000060 00000 n \n0000000111 00000 n \n0000000237 00000 n \n&quot;+Pos1.slice(-10)+&quot; 00000 n \n&quot;);
BS.WriteText(&quot;\ntrailer\n&lt;&lt;/Size 6/Info&lt;&lt;/Producer(JScrip2pdf)&gt;&gt;/Root 1 0 R&gt;&gt;\nstartxref\n&quot;+Pos2+&quot;\n%%EOF\n&quot;);

BS.SaveToFile(&quot;HelloWorldB&amp;W.pdf&quot;, 2);
BS.Close();

A simple means to fix such missing trailer is often solved using MuPDF-GL but that is windows only, so use command line MuTool for cross platform cleaning.

It too will alter the file to a cleaner 8,653 bytes format (so very few additions)

如何使用Java在PDF中插入图像。

%PDF-1.7
%&#194;&#181;&#194;&#182;

1 0 obj
&lt;&lt;/Type/Catalog/Pages 2 0 R&gt;&gt;
endobj

2 0 obj
&lt;&lt;/Type/Pages/Count 1/Kids[3 0 R]&gt;&gt;
endobj

3 0 obj
&lt;&lt;/Type/Page/Parent 2 0 R/MediaBox[0 0 595 842]/Contents 5 0 R/Resources&lt;&lt;/XObject&lt;&lt;/Im1 4 0 R&gt;&gt;&gt;&gt;&gt;&gt;
endobj

4 0 obj
&lt;&lt;/Type/XObject/Filter/DCTDecode/Length 8000/ColorSpace/DeviceRGB/Height 241/Width 209/Subtype/Image/BitsPerComponent 8&gt;&gt;
stream
&#255;&#216;&#255;&#224; JFIF      &#255;&#219; „

blah blah CAT Image bytes

NI dI$&#166;’I$ ’I$ ’I$&#199;(\’H&#39;&#168;+h’I/&#161;&#227;&#177;Œ&#208;&#168;Ÿ&#161;I%&#196;&#251;:Y]&#219;+&#248;BI-}&#232;&quot;&#221;–&#179;€&#166;V&#193;&#236;ž^‚‰&#205;I%&#212;s&#169;$„k{@I$’I%&#160;&#255;&#217;
endstream
endobj

5 0 obj
&lt;&lt;/Length 35&gt;&gt;
stream
q
209 0 0 241 0 0 cm
/Im1 Do
Q
q
Q

endstream
endobj

xref
0 6
0000000000 65536 f 
0000000016 00000 n 
0000000062 00000 n 
0000000114 00000 n 
0000000231 00000 n 
0000008387 00000 n 

trailer
&lt;&lt;/Size 6/Root 1 0 R&gt;&gt;
startxref
8471
%%EOF

huangapple
  • 本文由 发表于 2023年7月11日 03:04:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76656620.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定