英文:
Adding images in raw PDF
问题
我正在尝试手动创建PDF文件。我已经掌握了基础知识,但有一件事我无法弄清楚,那就是如何处理图像。
我目前正在尝试的是,作为一个起步,通过二进制代码添加一个简单的图像。这个二进制代码的长度为9,应该能够表示一个3x3的黑白图像。代码是:111000111(这应该只是一个横穿中间的黑线)。当然,这太过简化,没有压缩,对于更复杂的图像也不可用,但我非常迫切,只是想显示一些东西 :).
希望有人能帮助我,并教我更多关于这个主题的知识。
我的新PDF(在johnwhitington的评论之后,除了点b之外):
%PDF-1.7
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog >>
endobj
2 0 obj
<<
/Type /Pages
/Count 1
/Kids [3 0 R] >>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Contents [4 0 R]
/MediaBox [500 500]
/Resources
<<
/XObject
<<
/Im1 5 0 R
>>
>>
>>
endobj
4 0 obj
<<
>>
stream
q
1 0 0 1 100 100 cm
/Im1 Do
Q
endstream
endobj
5 0 obj
<<
/Type /XObject
/Subtype /Image
/Height 3
/Width 3
/BitsPerComponent 1
/Length 9
/ColorSpace /DeviceGray
>>
stream111000111endstream
endobj
trailer
<< /Root 1 0 R
/Size 7
>>
%%EOF
我的旧PDF:
%PDF-1.7
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog >>
endobj
2 0 obj
<<
/Type /Pages
/Count 1
/Kids [3 0 R] >>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Contents [4 0 R]
/Resources
<<
/ProcSet [/PDF /ImageB]
/XObject
<<
/Im1 5 0 R
>>
>>
>>
endobj
4 0 obj
<<
>>
stream
q
1 0 0 1 0 0 cm
/Im1 DO
Q
endstream
endobj
5 0 obj
<<
/Type /XObject
/Subtype /Image
/Height 3
/Width 3
/BitsPerComponent 1
/Length 9
/ColorSpace /DeviceGray
>>
stream111000111endstream
endobj
trailer
<< /Root 1 0 R
/Size 7
>>
%%EOF
英文:
I am trying to make pdf's manually. I got the basics but the one thing I cannot figure out is images.
The thing I am trying right now, as a start, is adding a simple image in the form of binary code. The binary code has a length of 9 and should be able to represent a black and white image of 3x3. The code being: 111000111 (this should just be a black horizontal line through the middle). Ofcourse this is over simplified, not compressed and not usable for more complex images but I am desperate and just want to display SOMETHING :).
Hope someone can help and teach me more about this topic.
my new pdf (after johnwhitington comment except for point b)
%PDF-1.7
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog >>
endobj
2 0 obj
<<
/Type /Pages
/Count 1
/Kids [3 0 R] >>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Contents [4 0 R]
/MediaBox [500 500]
/Resources
<<
/XObject
<<
/Im1 5 0 R
>>
>>
>>
endobj
4 0 obj
<<
>>
stream
q
1 0 0 1 100 100 cm
/Im1 Do
Q
endstream
endobj
5 0 obj
<<
/Type /XObject
/Subtype /Image
/Height 3
/Width 3
/BitsPerComponent 1
/Length 9
/ColorSpace /DeviceGray
>>
stream111000111endstream
endobj
trailer
<< /Root 1 0 R
/Size 7
>>
%%EOF
my old PDF:
%PDF-1.7
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog >>
endobj
2 0 obj
<<
/Type /Pages
/Count 1
/Kids [3 0 R] >>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Contents [4 0 R]
/Resources
<<
/ProcSet [/PDF /ImageB]
/XObject
<<
/Im1 5 0 R
>>
>>
>>
endobj
4 0 obj
<<
>>
stream
q
1 0 0 1 0 0 cm
/Im1 DO
Q
endstream
endobj
5 0 obj
<<
/Type /XObject
/Subtype /Image
/Height 3
/Width 3
/BitsPerComponent 1
/Length 9
/ColorSpace /DeviceGray
>>
stream111000111endstream
endobj
trailer
<< /Root 1 0 R
/Size 7
>>
%%EOF
答案1
得分: 2
@johnwhitington 已经介绍了 PDF 中基本的图像编程。
图像通常使用四行代码在页面内容的 q
块中放置和缩放,示例代码如下:
q 192 0 0 192 100 100 cm /Img0 Do Q
192 是 dx 和 dy,0 是“倾斜”,100 是 x 和 y,Img0 是图像编号,Do
是叠加写入代码。这些并没有描述 Img0
的实际大小。
要查看使用混合文本方法插入 JPEG,请参见 https://stackoverflow.com/a/75710613/10802527
因此,您需要使用编辑器中的任何有效方法注入图像像素,对于 RGB 图像,最简单的方法是使用 JPG 导入,然而,JPG 对于 PDF 不是理想的格式,因为它是纯二进制的,大多数文本文件不能使用纯二进制输入。因此,对于 MP4 视频和 JPEG 图像,需要将其转换为文本安全格式,比如文本的十六进制 00 01 02 03
,以便将所有 256 ANSI 代码写入 ANSI 编辑器,如记事本。
换句话说,对于字节,1 位黑色是 00
,白色是 FF
,对于 RGB,分别是 FFFFFF
和 000000
。
正确,那么如何写入该像素就像这样,设置指针指向页面资源中的一个对象,例如像 <</XObject <</Img0 6 0 R>>>>
这样的条目。
6 0 obj
需要声明像像素数量、颜色和编码类型这样的信息。
6 0 obj <</Type/XObject/Subtype/Image/ColorSpace/Device...
在您的示例中,我们可以看到
16 0 obj
<<
/Type /XObject
/Subtype /Image
/Height 3
/Width 3
/BitsPerComponent 1
/Length 9
/ColorSpace /DeviceGray
>>
stream
­ 
endstream
请注意,这样做,不会在流 start
和 endstream
之间产生任何可见对象,为什么!它看起来应该是 001101101
。
答案是因为它实际上是二进制流,而在记事本等 ANSI 编辑器中不可见。
这些字符是二进制 20 AD A0
,其中 20 是空白,AD 是二进制 101 01101
,\n
= A0
因此,这些字节看起来像
00100000
10101101
10100000
通过将其扩展到 6 位来测试,正如预期的那样,我们现在得到了这个
因此,核心问题是,在文本格式中,比特被视为文字而不是可见的 01,这对于处理图像不方便。在这个级别,我们需要开始使用编码,比如 ASCII 十六进制(/ASCIIHexDecode)
答案
因此,PDF 是位流作为字节流,您想要的是 111000111
这将是
111
000
111
因此,它是
11100000
00000000
11100000
轻松使用这个或类似的
stream
àà
endstream
其中两个 à
之间有一个看不见的黑色字符
结果将完美像素,但显示黑色 00000000 字符在 ANSI 文本中不容易编写,它们理想上需要十六进制编码,对于 RGB 的一个技巧是,对于黑色,使用空格表示为 [space] = \x20 = 因此,与“安全”ANSI 文本字符的字节格式一样暗。因此,黑色和白色的 8 位将是 ÿÿÿ ÿÿÿ
因此,对于类似但略带黑色的结果,我们可以使用以下方法
/Height 3
/Width 3
/BitsPerComponent 8
/Length 27
/ColorSpace /DeviceRGB
>>
stream
ÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿ
endstream
endobj
英文:
@johnwhitington has covered the basic image programming in a PDF
The image is placed and scaled usually using four lines of code in a q
block in the pages contents such as this working single line:-
q 192 0 0 192 100 100 cm /Img0 Do Q
192 is the dx and dy, 0 is the "skew" 100 is the x y, Img0 is the image number and Do
is the stacked write code. None of this describes what real size Img0
is
To see a jpeg insertion using a hybrid text approach (after the text is prepared) see https://stackoverflow.com/a/75710613/10802527
So you need to inject the image pixels using any method that works in the editor, for RGB images JPG import is simplest HOWEVER JPG is not an ideal format for PDF as its pure binary and most text files cannot use pure binary inputs. So for MP4 video and JPEG images they need to be converted into a text safe format such as textual HeX 00 01 02 03
etc thus all 256 ANSI codes can be written into an ANSI editor, such as NotePad.
In other byte words 1 bit black is 00
and white is FF
for RGB that's FFFFFF
and 000000
Right, so how to write that pixel is like this set a pointer to an object in the page resources perhaps an entry like <</XObject <</Img0 6 0 R>>>>
the
6 0 obj
needs declarations such as number of pixels and colors and encoding type.
6 0 obj <</Type/XObject/Subtype/Image/ColorSpace/Device...
in your example we can see
16 0 obj
<<
/Type /XObject
/Subtype /Image
/Height 3
/Width 3
/BitsPerComponent 1
/Length 9
/ColorSpace /DeviceGray
>>
stream
­ 
endstream
note that will without any visible object between stream start
and endstream
produce this odd image WHY ! that looks like it should be
001101101
the answer is because its what's really there as binary stream which is not visible in an ANSI editor like notepad.
the characters are binary 20 AD A0
where 20 is a blank whitespace and AD is
binary 101 01101
and \n
= A0
so those bytes look like
00100000
10101101
10100000
lets test that by widen to 6 bits and as expected we now get this
So the core issue is that in a text format the bits are taken as literal NOT visible 01's which cannot be convenient for handling imagery. At this level what we need is to start use an encoding such as ASCII HeX (/ASCIIHexDecode)
Answer
So a PDF is a Bitstream as a ByteStream and you want 111000111
that will be
111
000
111
thus its
11100000
00000000
11100000
easy use this or similar
stream
àà
endstream
where there is an invisible black character between the two à
's
result will be pixel perfect, but shows that black 00000000 characters are not easy to write in ANSI text they ideally need HEX coding one trick for RGB is use a white space for black as [space] = \x20 = thus as dark as a "safe" ANSI text character can be in its bytes format. so blackish and white as 8 bits would be ÿÿÿ ÿÿÿ
Hence for a similar, BUT blackish result as above we could use
/Height 3
/Width 3
/BitsPerComponent 8
/Length 27
/ColorSpace /DeviceRGB
>>
stream
ÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿ
endstream
endobj
答案2
得分: 1
a) 你不需要执行。
b) 如果每个分量是一个位,而且有九个像素,图像中将有两个字节(九个位),而不是九个字节。 '1' 和 '0' 是 8 位字符,而不是位。
c) 现在不需要 ProcSet。
d) 要在屏幕上看到你的图像,你需要类似于 1 0 0 1 100 100 cm 这样的东西来将其放大,以便可见。
e) 你的页面需要一个 /MediaBox。
英文:
A few hints:
a) You need Do not DO.
b) If it's one bit per component, and nine pixels, there will be two bytes (nine bits), not nine bytes in the image. '1' and '0' are 8-bit characters, not bits.
c) You don't need ProcSet these days.
d) To see your image on the screen, you'll want something like 1 0 0 1 100 100 cm to scale it up so it's visible.
e) Your page needs a /MediaBox
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论