Edit binary data in PDF with SED / BBE (change colors in a PDF)

huangapple go评论60阅读模式
英文:

Edit binary data in PDF with SED / BBE (change colors in a PDF)

问题

我想批量更改一批PDF文件中的一些背景颜色。

我发现颜色信息存储在第一个 stream - endstream 块中,格式如下:1 1 1 sc,在此示例中表示白色 #FFFFFF

这是我使用以下命令解码二进制流后的示例:

stream
q Q q /Cs1 cs 0.9686275 0.9725490 0.9764706 sc 0 12777 m 600 12777 l 600 0
l 0 0 l h f 0 12777 m 600 12777 l 600 0 l 0 0 l h f ➡️1 1 1 sc⬅️ 0 12575 m 600
12575 l 600 12308 l 0 12308 l h f 0.1254902 0.2666667 0.3921569 sc 0 872 m
600 872 l 600 462 l 0 462 l h f 0 462 m 600 462 l 600 0 l 0 0 l h f ➡️1 1 1
sc⬅️ 0 12297 m 600 12297 l 600 5122 l 0 5122 l h f 0.7411765 0.8980392 0.9725490
sc 23 7249 m 577 7249 l 577 6007 l 23 6007 l h f 1 0.9215686 0.9333333 sc
23 5848 m 577 5848 l 577 5533 l 23 5533 l h f 0.9686275 0.9725490 0.9764706
sc 23 5510 m 577 5510 l 577 5156 l 23 5156 l h f ➡️1 1 1 sc⬅️ 0 5110 m 600 5110
...
endstream

如果我在TextEdit中打开PDF文件并手动将 1 1 1 sc 替换为 0 1 0 sc,保存PDF文件后,白色背景会立即更改为绿色。

如何以自动化方式使用文本工具完成此操作?

  1. sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf<br/>
    给我错误:sed: RE error: illegal byte sequence

  2. bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf<br/>
    没有错误,OUT.pdf已写入,但颜色没有更改<br/>
    echo 'hello 1 1 1 sc world' | bbe -e 's/1 1 1 sc/0 1 0 sc/'
    似乎可以正常工作...

在上述流中(第一个流块)中,我只需要替换第二个和第三个查找。第二个查找有一个换行符?

英文:

I want to change some background colors in a batch of PDF's

I found out that the color information is stored in the first stream - endstream block<br/>
in a format like such: 1 1 1 sc which in this example represents white #FFFFFF

here an example after I decode the binary stream with<br/>
qpdf --qdf --object-streams=disable IN.pdf OUT.pdf

<pre>stream
q Q q /Cs1 cs 0.9686275 0.9725490 0.9764706 sc 0 12777 m 600 12777 l 600 0
l 0 0 l h f 0 12777 m 600 12777 l 600 0 l 0 0 l h f ➡️1 1 1 sc⬅️ 0 12575 m 600
12575 l 600 12308 l 0 12308 l h f 0.1254902 0.2666667 0.3921569 sc 0 872 m
600 872 l 600 462 l 0 462 l h f 0 462 m 600 462 l 600 0 l 0 0 l h f ➡️1 1 1
sc⬅️ 0 12297 m 600 12297 l 600 5122 l 0 5122 l h f 0.7411765 0.8980392 0.9725490
sc 23 7249 m 577 7249 l 577 6007 l 23 6007 l h f 1 0.9215686 0.9333333 sc
23 5848 m 577 5848 l 577 5533 l 23 5533 l h f 0.9686275 0.9725490 0.9764706
sc 23 5510 m 577 5510 l 577 5156 l 23 5156 l h f ➡️1 1 1 sc⬅️ 0 5110 m 600 5110
...
endstream
</pre>

If I open the PDF in TextEdit and manually replace 1 1 1 sc with 0 1 0 sc my white background immediately changes to green after saving the PDF file.

How can I do this in an automated way with a Text Tool?

  1. sed &#39;s/1 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf<br/>
    gives me the error: sed: RE error: illegal byte sequence

  2. bbe -e &#39;s/0 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf<br/>
    no errors, OUT.pdf is written but no colors have changed<br/>
    echo &#39;hello 1 1 1 sc world&#39; | bbe -e &#39;s/1 1 1 sc/0 1 0 sc/&#39;
    seems to work fine...

In the above stream (the first stream block) in the 1-page PDF file I need to replace only the second and third find. The second one has a line break?

答案1

得分: 2

这些是您提到的命令:

qpdf --qdf --object-streams=disable IN.pdf OUT.pdf

sed &#39;s/1 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

bbe -e &#39;s/0 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

请注意,IN.pdfsedbbe 命令中是否与 qpdf 命令中的 IN.pdf 文件相同并不明显。

如果这三个命令都使用相同的输入文件,那可能解释了为什么 bbe 失败。

另一种可能性是,所示的 bbe 命令可能是您实际使用的命令,而不是拼写错误。它实际上并没有查找字符串 1 1 1 sc


sed 不适用于二进制数据。虽然 GNU 实现具有非标准的 -z 选项来帮助读取二进制文件,但它仍然是基于“行”的形式工作。在这种情况下,Perl 可以用作改进后的 sed

要仅更改文件中字符串 1 1 1 sc 的前三个实例,您可以尝试以下命令:

qpdf --qdf --object-streams=disable IN.pdf - |\
perl -0777 -pe &#39;for $i (1..3) { s/1 1 1 sc/0 1 0 sc/ }&#39; |\
qpdf - OUT.pdf

在此 Perl 命令中:

  • -0777 - 将整个输入视为单个记录
  • -pe - 对每个记录运行命令,然后打印(类似于 sed
  • for $i (1..3) { ... } - 运行三次
  • s/.../.../ - 类似于 seds/// 命令
英文:

It is not completely clear what you are doing.

You mention commands:

qpdf --qdf --object-streams=disable IN.pdf OUT.pdf

sed &#39;s/1 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

bbe -e &#39;s/0 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

It is not obvious if IN.pdf in the sed or bbe commands is the same IN.pdf file as the qpdf command.

If all three commands are using the same file as input, then that can explain why bbe fails.

Another possibility is that the bbe command shown is the command you are actually using and not a typo. It does not actually look for the string 1 1 1 sc.


sed is not designed to work with binary data.
Although the GNU implementation has a non-standard -z option to help read binary files, it still works on a form of "lines". Perl can be used as an improved sed here.

To change only the first three instances of the string 1 1 1 sc in the file, you could try:

qpdf --qdf --object-streams=disable IN.pdf - |\
perl -0777 -pe &#39;for $i (1..3) { s/1 1 1 sc/0 1 0 sc/ }&#39; |\
qpdf - OUT.pdf

In this Perl command:

  • -0777 - treat entire input as single record
  • -pe - run command on each record, then print (like sed)
  • for $i (1..3) { ... } - run three times
  • s/.../.../ - similar to sed's s/// command

答案2

得分: 0

我打算使用PikePDF处理这个任务,PikePDF是一个Python库,似乎可以处理内容流:https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html

我刚刚能够通过以下方法漂亮地打印内容流:

#!/usr/bin/env python

from pikepdf import Pdf
import pikepdf

with pikepdf.open('IN.pdf') as pdf:
    page 		 = pdf.pages[0]
    instructions = pikepdf.parse_content_stream(page)
    data 		 = pikepdf.unparse_content_stream(instructions)
    print(data.decode('ascii'))

现在我正在努力实际编辑内容流......

这是我问题中的流片段,漂亮地打印出来:

q
Q
q
/Cs1 cs
0.9686275 0.9725490 0.9764706 sc
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12575 m
600 12575 l
600 12308 l
0 12308 l
h
f
0.1254902 0.2666667 0.3921569 sc
0 872 m
600 872 l
600 462 l
0 462 l
h
f
0 462 m
600 462 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12297 m
600 12297 l
600 5122 l
0 5122 l
h
f
0.7411765 0.8980392 0.9725490 sc
23 7249 m
577 7249 l
577 6007 l
23 6007 l
h
f
1 0.9215686 0.9333333 sc
23 5848 m
577 5848 l
577 5533 l
23 5533 l
h
f
0.9686275 0.9725490 0.9764706 sc
23 5510 m
577 5510 l
577 5156 l
23 5156 l
h
f
➡️1 1 1 sc⬅️
0 5110 m
600 5110

关于颜色值的更多信息:
只需将RGB值除以255,例如:

DeepSkyBlue = #00bfff = RGB(0, 191, 255)
0/255 = 0
191/255 = 0.7490196
255/255 = 1

0 0.7490196 1 sc
英文:

I think I will tackle this task with PikePDF, a Python library which seems to be able to work with content streams: https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html

I was just able to Pretty Print the content streams by using:

#!/usr/bin/env python

from pikepdf import Pdf
import pikepdf

with pikepdf.open(&#39;IN.pdf&#39;) as pdf:
	page 		 = pdf.pages[0]
	instructions = pikepdf.parse_content_stream(page)
	data 		 = pikepdf.unparse_content_stream(instructions)
	print(data.decode(&#39;ascii&#39;))

Now working my way to actual Edit the content stream ..........

Here the stream fragment from my question, pretty printed:

q
Q
q
/Cs1 cs
0.9686275 0.9725490 0.9764706 sc
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12575 m
600 12575 l
600 12308 l
0 12308 l
h
f
0.1254902 0.2666667 0.3921569 sc
0 872 m
600 872 l
600 462 l
0 462 l
h
f
0 462 m
600 462 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12297 m
600 12297 l
600 5122 l
0 5122 l
h
f
0.7411765 0.8980392 0.9725490 sc
23 7249 m
577 7249 l
577 6007 l
23 6007 l
h
f
1 0.9215686 0.9333333 sc
23 5848 m
577 5848 l
577 5533 l
23 5533 l
h
f
0.9686275 0.9725490 0.9764706 sc
23 5510 m
577 5510 l
577 5156 l
23 5156 l
h
f
➡️1 1 1 sc⬅️
0 5110 m
600 5110

Some more info about the color value:
Just divide the RGB values by 255<br/>
for example:

DeepSkyBlue = #00bfff = RGB(0, 191, 255)<br/>
0/255 = 0<br/>
191/255 = 0.7490196<br/>
255/255 = 1<br/>

0 0.7490196 1 sc

huangapple
  • 本文由 发表于 2023年2月19日 07:11:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75496963.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定