问题

Sure, here's the translated content:

"我正在研究PDF的内部结构，所以我在LibreOffice Writer中创建了一个文件，只写了字符串“Hello world”，然后将其导出为PDF。然后，我使用以下命令进行解压：pdftk hello_world.pdf output hello_world_unc.pdf uncompress，并在文本编辑器中打开它。

分析流时，我得到了类似以下奇怪的内容：[<01>5<02>-6<03>2<03>2<040506>-2 <040703>2<08>]TJ，这应该表示“Hello world”作为十六进制字符串数组（在尖括号中），以及用整数指定的间距。

我声明文件只包含此字符串，专门用于教育目的。

问题是，它们看起来不像应该的十六进制字符。也就是说，“H”肯定不是用01表示的。
我期望看到类似这样的内容：(Hello world) Tj。

有人能帮助我理解吗？提前感谢。"

英文:

I'm studying the internal structure of pdf, so i created a file in libreoffice writer, writing only the string "Hello world" and exported it to pdf. So I uncompressed it with: pdftk hello_world.pdf output hello_world_unc.pdf uncompress and opened it with a text editor.

Analyzing the stream I get something strange like this: [<01>5<02>-6<03>2<03>2<040506>-2 <040703>2<08>]TJ which should represent "Hello world" as an array of hexadecimal strings (in the angle brackets), and integers to specify the spacing.

I state that the file contains only this string, created precisely for educational purposes.

The problem is that they don't look like hexadecimal characters to me as they should be. That is, surely the "H" is not represented with 01.
I was expecting something like this: (Hello world) Tj.

Can anyone help me understand? Thanks in advance

答案1

得分: 0

这些数字只是字符映射表中的索引。

深入研究未压缩的PDF，您会发现一些类似以下的行：

&lt;02&gt; &lt;0065&gt;
&lt;03&gt; &lt;006C&gt;
&lt;04&gt; &lt;006F&gt;
&lt;05&gt; &lt;0020&gt;
&lt;06&gt; &lt;0077&gt;
&lt;07&gt; &lt;0072&gt;
&lt;08&gt; &lt;0064&gt;```

<details>
<summary>英文:</summary>

These numbers are just indexes into the character map.

Investigate the uncompressed PDF deeper. And you will find some lines like these:

<01> <0048>
<02> <0065>
<03> <006C>
<04> <006F>
<05> <0020>
<06> <0077>
<07> <0072>
<08> <0064>


</details>



# 答案2
**得分**: 0

- 字距调整（kerning）正在使用，因此使用了一个TJ数组而不是Tj字符串。这些数字表示以1/1000个字体单位（em）测量的字距；

- &lt;&gt; 字符串是PDF十六进制字符串，而不是普通的PDF字符串；

- 在字体中查找/ToUnicode映射。如果存在，它将帮助你将PDF码点映射到Unicode码点序列。

<details>
<summary>英文:</summary>

- kerning is in use, so a TJ array is being used instead of a Tj string. The numbers are kerns measured in 1/1000 of an em (from memory);

- The &lt;&gt; strings are PDF hex strings, not ordinary PDF strings;

- Look for a /ToUnicode map in the font. If this exists, it will help you with the mapping from PDF code points to sequences of unicode code points.

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Strange encoding of a pdf stream

问题

答案1

iText生成的PDF与CorelDraw生成的PDF之间的区别

最佳方法将HTML页面转换为PDF文件。

将一个非常长的图形保存为PDF文件在R中会导致所有文本不可见。

ServletResponse的outputstream不能写入以大写.PDF结尾的文件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论