英文:
Edit binary data in PDF with SED / BBE (change colors in a PDF)
问题
我想批量更改一批PDF文件中的一些背景颜色。
我发现颜色信息存储在第一个 stream - endstream 块中,格式如下:1 1 1 sc
,在此示例中表示白色 #FFFFFF
这是我使用以下命令解码二进制流后的示例:
stream
q Q q /Cs1 cs 0.9686275 0.9725490 0.9764706 sc 0 12777 m 600 12777 l 600 0
l 0 0 l h f 0 12777 m 600 12777 l 600 0 l 0 0 l h f ➡️1 1 1 sc⬅️ 0 12575 m 600
12575 l 600 12308 l 0 12308 l h f 0.1254902 0.2666667 0.3921569 sc 0 872 m
600 872 l 600 462 l 0 462 l h f 0 462 m 600 462 l 600 0 l 0 0 l h f ➡️1 1 1
sc⬅️ 0 12297 m 600 12297 l 600 5122 l 0 5122 l h f 0.7411765 0.8980392 0.9725490
sc 23 7249 m 577 7249 l 577 6007 l 23 6007 l h f 1 0.9215686 0.9333333 sc
23 5848 m 577 5848 l 577 5533 l 23 5533 l h f 0.9686275 0.9725490 0.9764706
sc 23 5510 m 577 5510 l 577 5156 l 23 5156 l h f ➡️1 1 1 sc⬅️ 0 5110 m 600 5110
...
endstream
如果我在TextEdit中打开PDF文件并手动将 1 1 1 sc
替换为 0 1 0 sc
,保存PDF文件后,白色背景会立即更改为绿色。
如何以自动化方式使用文本工具完成此操作?
-
sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
<br/>
给我错误:sed: RE error: illegal byte sequence
-
bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
<br/>
没有错误,OUT.pdf已写入,但颜色没有更改<br/>
echo 'hello 1 1 1 sc world' | bbe -e 's/1 1 1 sc/0 1 0 sc/'
似乎可以正常工作...
在上述流中(第一个流块)中,我只需要替换第二个和第三个查找。第二个查找有一个换行符?
英文:
I want to change some background colors in a batch of PDF's
I found out that the color information is stored in the first stream - endstream block<br/>
in a format like such: 1 1 1 sc
which in this example represents white #FFFFFF
here an example after I decode the binary stream with<br/>
qpdf --qdf --object-streams=disable IN.pdf OUT.pdf
<pre>stream
q Q q /Cs1 cs 0.9686275 0.9725490 0.9764706 sc 0 12777 m 600 12777 l 600 0
l 0 0 l h f 0 12777 m 600 12777 l 600 0 l 0 0 l h f ➡️1 1 1 sc⬅️ 0 12575 m 600
12575 l 600 12308 l 0 12308 l h f 0.1254902 0.2666667 0.3921569 sc 0 872 m
600 872 l 600 462 l 0 462 l h f 0 462 m 600 462 l 600 0 l 0 0 l h f ➡️1 1 1
sc⬅️ 0 12297 m 600 12297 l 600 5122 l 0 5122 l h f 0.7411765 0.8980392 0.9725490
sc 23 7249 m 577 7249 l 577 6007 l 23 6007 l h f 1 0.9215686 0.9333333 sc
23 5848 m 577 5848 l 577 5533 l 23 5533 l h f 0.9686275 0.9725490 0.9764706
sc 23 5510 m 577 5510 l 577 5156 l 23 5156 l h f ➡️1 1 1 sc⬅️ 0 5110 m 600 5110
...
endstream
</pre>
If I open the PDF in TextEdit and manually replace 1 1 1 sc
with 0 1 0 sc
my white background immediately changes to green after saving the PDF file.
How can I do this in an automated way with a Text Tool?
-
sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
<br/>
gives me the error:sed: RE error: illegal byte sequence
-
bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
<br/>
no errors, OUT.pdf is written but no colors have changed<br/>
echo 'hello 1 1 1 sc world' | bbe -e 's/1 1 1 sc/0 1 0 sc/'
seems to work fine...
In the above stream (the first stream block) in the 1-page PDF file I need to replace only the second and third find. The second one has a line break?
答案1
得分: 2
这些是您提到的命令:
qpdf --qdf --object-streams=disable IN.pdf OUT.pdf
sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
请注意,IN.pdf
在 sed
和 bbe
命令中是否与 qpdf
命令中的 IN.pdf
文件相同并不明显。
如果这三个命令都使用相同的输入文件,那可能解释了为什么 bbe
失败。
另一种可能性是,所示的 bbe
命令可能是您实际使用的命令,而不是拼写错误。它实际上并没有查找字符串 1 1 1 sc
。
sed
不适用于二进制数据。虽然 GNU 实现具有非标准的 -z
选项来帮助读取二进制文件,但它仍然是基于“行”的形式工作。在这种情况下,Perl 可以用作改进后的 sed
。
要仅更改文件中字符串 1 1 1 sc
的前三个实例,您可以尝试以下命令:
qpdf --qdf --object-streams=disable IN.pdf - |\
perl -0777 -pe 'for $i (1..3) { s/1 1 1 sc/0 1 0 sc/ }' |\
qpdf - OUT.pdf
在此 Perl 命令中:
-0777
- 将整个输入视为单个记录-pe
- 对每个记录运行命令,然后打印(类似于sed
)for $i (1..3) { ... }
- 运行三次s/.../.../
- 类似于sed
的s///
命令
英文:
It is not completely clear what you are doing.
You mention commands:
qpdf --qdf --object-streams=disable IN.pdf OUT.pdf
sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf
It is not obvious if IN.pdf
in the sed
or bbe
commands is the same IN.pdf
file as the qpdf
command.
If all three commands are using the same file as input, then that can explain why bbe
fails.
Another possibility is that the bbe
command shown is the command you are actually using and not a typo. It does not actually look for the string 1 1 1 sc
.
sed
is not designed to work with binary data.
Although the GNU implementation has a non-standard -z
option to help read binary files, it still works on a form of "lines". Perl can be used as an improved sed here.
To change only the first three instances of the string 1 1 1 sc
in the file, you could try:
qpdf --qdf --object-streams=disable IN.pdf - |\
perl -0777 -pe 'for $i (1..3) { s/1 1 1 sc/0 1 0 sc/ }' |\
qpdf - OUT.pdf
In this Perl command:
-0777
- treat entire input as single record-pe
- run command on each record, then print (like sed)for $i (1..3) { ... }
- run three timess/.../.../
- similar tosed
's s/// command
答案2
得分: 0
我打算使用PikePDF处理这个任务,PikePDF是一个Python库,似乎可以处理内容流:https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html
我刚刚能够通过以下方法漂亮地打印内容流:
#!/usr/bin/env python
from pikepdf import Pdf
import pikepdf
with pikepdf.open('IN.pdf') as pdf:
page = pdf.pages[0]
instructions = pikepdf.parse_content_stream(page)
data = pikepdf.unparse_content_stream(instructions)
print(data.decode('ascii'))
现在我正在努力实际编辑内容流......
这是我问题中的流片段,漂亮地打印出来:
q
Q
q
/Cs1 cs
0.9686275 0.9725490 0.9764706 sc
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12575 m
600 12575 l
600 12308 l
0 12308 l
h
f
0.1254902 0.2666667 0.3921569 sc
0 872 m
600 872 l
600 462 l
0 462 l
h
f
0 462 m
600 462 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12297 m
600 12297 l
600 5122 l
0 5122 l
h
f
0.7411765 0.8980392 0.9725490 sc
23 7249 m
577 7249 l
577 6007 l
23 6007 l
h
f
1 0.9215686 0.9333333 sc
23 5848 m
577 5848 l
577 5533 l
23 5533 l
h
f
0.9686275 0.9725490 0.9764706 sc
23 5510 m
577 5510 l
577 5156 l
23 5156 l
h
f
➡️1 1 1 sc⬅️
0 5110 m
600 5110
关于颜色值的更多信息:
只需将RGB值除以255,例如:
DeepSkyBlue = #00bfff = RGB(0, 191, 255)
0/255 = 0
191/255 = 0.7490196
255/255 = 1
0 0.7490196 1 sc
英文:
I think I will tackle this task with PikePDF, a Python library which seems to be able to work with content streams: https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html
I was just able to Pretty Print the content streams by using:
#!/usr/bin/env python
from pikepdf import Pdf
import pikepdf
with pikepdf.open('IN.pdf') as pdf:
page = pdf.pages[0]
instructions = pikepdf.parse_content_stream(page)
data = pikepdf.unparse_content_stream(instructions)
print(data.decode('ascii'))
Now working my way to actual Edit the content stream ..........
Here the stream fragment from my question, pretty printed:
q
Q
q
/Cs1 cs
0.9686275 0.9725490 0.9764706 sc
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12575 m
600 12575 l
600 12308 l
0 12308 l
h
f
0.1254902 0.2666667 0.3921569 sc
0 872 m
600 872 l
600 462 l
0 462 l
h
f
0 462 m
600 462 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12297 m
600 12297 l
600 5122 l
0 5122 l
h
f
0.7411765 0.8980392 0.9725490 sc
23 7249 m
577 7249 l
577 6007 l
23 6007 l
h
f
1 0.9215686 0.9333333 sc
23 5848 m
577 5848 l
577 5533 l
23 5533 l
h
f
0.9686275 0.9725490 0.9764706 sc
23 5510 m
577 5510 l
577 5156 l
23 5156 l
h
f
➡️1 1 1 sc⬅️
0 5110 m
600 5110
Some more info about the color value:
Just divide the RGB values by 255<br/>
for example:
DeepSkyBlue = #00bfff = RGB(0, 191, 255)<br/>
0/255 = 0<br/>
191/255 = 0.7490196<br/>
255/255 = 1<br/>
0 0.7490196 1 sc
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论