2023年2月19日 07:11:10go评论65阅读模式

英文:

Edit binary data in PDF with SED / BBE (change colors in a PDF)

问题

我想批量更改一批PDF文件中的一些背景颜色。

我发现颜色信息存储在第一个 stream - endstream 块中，格式如下：1 1 1 sc，在此示例中表示白色 #FFFFFF

这是我使用以下命令解码二进制流后的示例：

stream
q Q q /Cs1 cs 0.9686275 0.9725490 0.9764706 sc 0 12777 m 600 12777 l 600 0
l 0 0 l h f 0 12777 m 600 12777 l 600 0 l 0 0 l h f ➡️1 1 1 sc⬅️ 0 12575 m 600
12575 l 600 12308 l 0 12308 l h f 0.1254902 0.2666667 0.3921569 sc 0 872 m
600 872 l 600 462 l 0 462 l h f 0 462 m 600 462 l 600 0 l 0 0 l h f ➡️1 1 1
sc⬅️ 0 12297 m 600 12297 l 600 5122 l 0 5122 l h f 0.7411765 0.8980392 0.9725490
sc 23 7249 m 577 7249 l 577 6007 l 23 6007 l h f 1 0.9215686 0.9333333 sc
23 5848 m 577 5848 l 577 5533 l 23 5533 l h f 0.9686275 0.9725490 0.9764706
sc 23 5510 m 577 5510 l 577 5156 l 23 5156 l h f ➡️1 1 1 sc⬅️ 0 5110 m 600 5110
...
endstream

如果我在TextEdit中打开PDF文件并手动将 1 1 1 sc 替换为 0 1 0 sc，保存PDF文件后，白色背景会立即更改为绿色。

如何以自动化方式使用文本工具完成此操作？

sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf 
给我错误：sed: RE error: illegal byte sequence
bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf 
没有错误，OUT.pdf已写入，但颜色没有更改 
echo 'hello 1 1 1 sc world' | bbe -e 's/1 1 1 sc/0 1 0 sc/'
似乎可以正常工作...

在上述流中（第一个流块）中，我只需要替换第二个和第三个查找。第二个查找有一个换行符？

英文:

I want to change some background colors in a batch of PDF's

I found out that the color information is stored in the first stream - endstream block 
in a format like such: 1 1 1 sc which in this example represents white #FFFFFF

here an example after I decode the binary stream with 
qpdf --qdf --object-streams=disable IN.pdf OUT.pdf

<pre>stream
q Q q /Cs1 cs 0.9686275 0.9725490 0.9764706 sc 0 12777 m 600 12777 l 600 0
l 0 0 l h f 0 12777 m 600 12777 l 600 0 l 0 0 l h f ➡️1 1 1 sc⬅️ 0 12575 m 600
12575 l 600 12308 l 0 12308 l h f 0.1254902 0.2666667 0.3921569 sc 0 872 m
600 872 l 600 462 l 0 462 l h f 0 462 m 600 462 l 600 0 l 0 0 l h f ➡️1 1 1
sc⬅️ 0 12297 m 600 12297 l 600 5122 l 0 5122 l h f 0.7411765 0.8980392 0.9725490
sc 23 7249 m 577 7249 l 577 6007 l 23 6007 l h f 1 0.9215686 0.9333333 sc
23 5848 m 577 5848 l 577 5533 l 23 5533 l h f 0.9686275 0.9725490 0.9764706
sc 23 5510 m 577 5510 l 577 5156 l 23 5156 l h f ➡️1 1 1 sc⬅️ 0 5110 m 600 5110
...
endstream
</pre>

If I open the PDF in TextEdit and manually replace 1 1 1 sc with 0 1 0 sc my white background immediately changes to green after saving the PDF file.

How can I do this in an automated way with a Text Tool?

sed 's/1 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf 
gives me the error: sed: RE error: illegal byte sequence
bbe -e 's/0 1 1 sc/0 1 0 sc/' IN.pdf > OUT.pdf 
no errors, OUT.pdf is written but no colors have changed 
echo 'hello 1 1 1 sc world' | bbe -e 's/1 1 1 sc/0 1 0 sc/'
seems to work fine...

In the above stream (the first stream block) in the 1-page PDF file I need to replace only the second and third find. The second one has a line break?

答案1

得分: 2

这些是您提到的命令：

qpdf --qdf --object-streams=disable IN.pdf OUT.pdf

sed &#39;s/1 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

bbe -e &#39;s/0 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

请注意，IN.pdf 在 sed 和 bbe 命令中是否与 qpdf 命令中的 IN.pdf 文件相同并不明显。

如果这三个命令都使用相同的输入文件，那可能解释了为什么 bbe 失败。

另一种可能性是，所示的 bbe 命令可能是您实际使用的命令，而不是拼写错误。它实际上并没有查找字符串 1 1 1 sc。

sed 不适用于二进制数据。虽然 GNU 实现具有非标准的 -z 选项来帮助读取二进制文件，但它仍然是基于“行”的形式工作。在这种情况下，Perl 可以用作改进后的 sed。

要仅更改文件中字符串 1 1 1 sc 的前三个实例，您可以尝试以下命令：

qpdf --qdf --object-streams=disable IN.pdf - |\
perl -0777 -pe &#39;for $i (1..3) { s/1 1 1 sc/0 1 0 sc/ }&#39; |\
qpdf - OUT.pdf

在此 Perl 命令中：

-0777 - 将整个输入视为单个记录
-pe - 对每个记录运行命令，然后打印（类似于 sed）
for $i (1..3) { ... } - 运行三次
s/.../.../ - 类似于 sed 的 s/// 命令

英文:

It is not completely clear what you are doing.

You mention commands:

qpdf --qdf --object-streams=disable IN.pdf OUT.pdf

sed &#39;s/1 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

bbe -e &#39;s/0 1 1 sc/0 1 0 sc/&#39; IN.pdf &gt; OUT.pdf

It is not obvious if IN.pdf in the sed or bbe commands is the same IN.pdf file as the qpdf command.

If all three commands are using the same file as input, then that can explain why bbe fails.

Another possibility is that the bbe command shown is the command you are actually using and not a typo. It does not actually look for the string 1 1 1 sc.

sed is not designed to work with binary data.
Although the GNU implementation has a non-standard -z option to help read binary files, it still works on a form of "lines". Perl can be used as an improved sed here.

To change only the first three instances of the string 1 1 1 sc in the file, you could try:

qpdf --qdf --object-streams=disable IN.pdf - |\
perl -0777 -pe &#39;for $i (1..3) { s/1 1 1 sc/0 1 0 sc/ }&#39; |\
qpdf - OUT.pdf

In this Perl command:

-0777 - treat entire input as single record
-pe - run command on each record, then print (like sed)
for $i (1..3) { ... } - run three times
s/.../.../ - similar to sed's s/// command

答案2

得分: 0

我打算使用PikePDF处理这个任务，PikePDF是一个Python库，似乎可以处理内容流：https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html

我刚刚能够通过以下方法漂亮地打印内容流：

#!/usr/bin/env python

from pikepdf import Pdf
import pikepdf

with pikepdf.open('IN.pdf') as pdf:
    page 		 = pdf.pages[0]
    instructions = pikepdf.parse_content_stream(page)
    data 		 = pikepdf.unparse_content_stream(instructions)
    print(data.decode('ascii'))

现在我正在努力实际编辑内容流......

这是我问题中的流片段，漂亮地打印出来：

q
Q
q
/Cs1 cs
0.9686275 0.9725490 0.9764706 sc
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12575 m
600 12575 l
600 12308 l
0 12308 l
h
f
0.1254902 0.2666667 0.3921569 sc
0 872 m
600 872 l
600 462 l
0 462 l
h
f
0 462 m
600 462 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12297 m
600 12297 l
600 5122 l
0 5122 l
h
f
0.7411765 0.8980392 0.9725490 sc
23 7249 m
577 7249 l
577 6007 l
23 6007 l
h
f
1 0.9215686 0.9333333 sc
23 5848 m
577 5848 l
577 5533 l
23 5533 l
h
f
0.9686275 0.9725490 0.9764706 sc
23 5510 m
577 5510 l
577 5156 l
23 5156 l
h
f
➡️1 1 1 sc⬅️
0 5110 m
600 5110

关于颜色值的更多信息：
只需将RGB值除以255，例如：

DeepSkyBlue = #00bfff = RGB(0, 191, 255)
0/255 = 0
191/255 = 0.7490196
255/255 = 1

0 0.7490196 1 sc

英文:

I think I will tackle this task with PikePDF, a Python library which seems to be able to work with content streams: https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html

I was just able to Pretty Print the content streams by using:

#!/usr/bin/env python

from pikepdf import Pdf
import pikepdf

with pikepdf.open(&#39;IN.pdf&#39;) as pdf:
	page 		 = pdf.pages[0]
	instructions = pikepdf.parse_content_stream(page)
	data 		 = pikepdf.unparse_content_stream(instructions)
	print(data.decode(&#39;ascii&#39;))

Now working my way to actual Edit the content stream ..........

Here the stream fragment from my question, pretty printed:

q
Q
q
/Cs1 cs
0.9686275 0.9725490 0.9764706 sc
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
0 12777 m
600 12777 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12575 m
600 12575 l
600 12308 l
0 12308 l
h
f
0.1254902 0.2666667 0.3921569 sc
0 872 m
600 872 l
600 462 l
0 462 l
h
f
0 462 m
600 462 l
600 0 l
0 0 l
h
f
➡️1 1 1 sc⬅️
0 12297 m
600 12297 l
600 5122 l
0 5122 l
h
f
0.7411765 0.8980392 0.9725490 sc
23 7249 m
577 7249 l
577 6007 l
23 6007 l
h
f
1 0.9215686 0.9333333 sc
23 5848 m
577 5848 l
577 5533 l
23 5533 l
h
f
0.9686275 0.9725490 0.9764706 sc
23 5510 m
577 5510 l
577 5156 l
23 5156 l
h
f
➡️1 1 1 sc⬅️
0 5110 m
600 5110

Some more info about the color value:
Just divide the RGB values by 255 
for example:

DeepSkyBlue = #00bfff = RGB(0, 191, 255) 
0/255 = 0 
191/255 = 0.7490196 
255/255 = 1

0 0.7490196 1 sc

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Edit binary data in PDF with SED / BBE (change colors in a PDF)

问题

答案1

答案2

在Shiny R中保存具有不同尺寸的多个PDF页面。

Lua – 使用离散余弦变换 (DCT) 编码 JPEG 的能力

在第一个匹配项处停止。

在Java中，我如何判断一个PDF文件是否包含JBIG2图像？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论