两种字体有什么不同?

huangapple go评论81阅读模式
英文:

What is different the two font?

问题

当我在两个PDF文件中打印TextRenderInfo.getFont().getPostscriptFontName()时,会打印出AAAAAD+SourceHanSansCN-NormalBIISMY+SourceHanSansCN-Normal

我知道SourceHanSansCN-Normal是字体名称-字体脚本的格式,但AAAAAD是什么?它似乎不像字体系列。

示例代码:

public class CheckPdfAllFontTest implements TextExtractionStrategy {

    public static final String SRC = "ownTestFile.pdf";

    @Override
    public String getResultantText() {
        return null;
    }

    @Override
    public void beginTextBlock() {

    }

    @Override
    public void renderText(TextRenderInfo textRenderInfo) {
        String x = textRenderInfo.getFont().getPostscriptFontName();
        String text = textRenderInfo.getText();
        System.out.println(text + "=====" + x);
    }

    @Override
    public void endTextBlock() {

    }

    @Override
    public void renderImage(ImageRenderInfo imageRenderInfo) {

    }

    public static void main(String[] args) throws IOException, DocumentException {
        new CheckPdfAllFontTest().parse(SRC);
    }

    public void parse(String filename) throws IOException, IOException {
        int pageNumber = 1;
        PdfReader reader = new PdfReader(filename);
        System.out.println(PdfTextExtractor.getTextFromPage(reader, pageNumber, new CheckPdfAllFontTest()));
        reader.close();
    }
}

Itext PDF版本:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.8</version>
</dependency>

这两个PDF文件是从PowerPoint文件中导出的,分别采用了“嵌入字体”和“不嵌入字体”的设置。

  • "AAAAAD+SourceHanSansCN-Normal" 来自“嵌入字体”PDF文件。
  • "BIISMY+SourceHanSansCN-Normal" 来自“不嵌入字体”PDF文件。

我正在收集PDF中使用的字体,但我发现字体名称的格式中有一部分我不清楚。在“+”之前是什么意思?

英文:

When i print TextRenderInfo.getFont().getPostscriptFontName() in two pdf file, it will be printed AAAAAD+SourceHanSansCN-Normal and BIISMY+SourceHanSansCN-Normal.

I known SourceHanSansCN-Normal is format of FontName-FontScript, but what is AAAAAD ? Is not like font family.

Example Code:

public class CheckPdfAllFontTest implements TextExtractionStrategy {

    public static final String SRC = &quot;ownTestFile.pdf&quot;;

    @Override
    public String getResultantText() {
        return null;
    }

    @Override
    public void beginTextBlock() {

    }

    @Override
    public void renderText(TextRenderInfo textRenderInfo) {
        String x = textRenderInfo.getFont().getPostscriptFontName();
        String text = textRenderInfo.getText();
        System.out.println(text + &quot;=====&quot; + x);
    }

    @Override
    public void endTextBlock() {

    }

    @Override
    public void renderImage(ImageRenderInfo imageRenderInfo) {

    }

    public static void main(String[] args) throws IOException, DocumentException {
        new CheckPdfAllFontTest().parse(SRC);
    }

    public void parse(String filename) throws IOException, IOException {
        int pageNumber = 1;
        PdfReader reader = new PdfReader(filename);
        System.out.println(PdfTextExtractor.getTextFromPage(reader, pageNumber, new CheckPdfAllFontTest()));
        reader.close();
    }
}

Itext pdf version:

&lt;dependency&gt;
    &lt;groupId&gt;com.itextpdf&lt;/groupId&gt;
    &lt;artifactId&gt;itextpdf&lt;/artifactId&gt;
    &lt;version&gt;5.5.8&lt;/version&gt;
&lt;/dependency&gt;

The two pdf is exported with "embed font" and "not embed font" setting from a Power Point File.

  • "AAAAAD+SourceHanSansCN-Normal" from "embed font" pdf file.
  • "BIISMY+SourceHanSansCN-Normal" from "not embed font" pdf file.

I am collecting the fonts used in pdf, but I found that there are fonts in this format. I don’t know what is before the ‘+’. What is its definition?

答案1

得分: 1

根据PDF规范:

9.9.2 字体子集

PDF文档可以包括子集的PDF字体,其子类型Type1TrueTypeOpenType。描述字体子集的字体和字体描述符与普通字体略有不同。这些差异允许PDF处理器识别字体子集并合并包含不同子集的相同字体的文档。(有关字体描述符的更多信息,请参见9.8,“字体描述符”。)

对于字体子集,字体的PostScript名称,即字体的BaseFont条目的值和字体描述符的FontName条目,应以标签开头,后跟加号(+),然后是从创建子集的字体的PostScript名称。标签应包含完全由大写字母组成的六个字母;字母的选择是任意的,但同一PDF文件中相同字体的不同子集应具有不同的标签。字形名称**.notdef**应在字体子集中定义。

注意建议PDF处理器将多个子集字体视为完全独立的实体,即使它们似乎是从同一原始字体创建的。

示例 EOODIA+Poetica是Poetica®的子集的名称,它是一种Type 1字体。

(ISO 32000-2)

因此,AAAAAD+SourceHanSansCN-Normal和BIISMY+SourceHanSansCN-Normal很可能是相同源字体的不同子集。

英文:

According to the PDF specification:

>### 9.9.2 Font subsets
>
>PDF documents may include subsets of PDF fonts whose Subtype is Type1, TrueType or OpenType. The font and font descriptor that describe a font subset are slightly different from those of ordinary fonts. These differences allow a PDF processor to recognise font subsets and to merge documents containing different subsets of the same font. (For more information on font descriptors, see 9.8, "Font descriptors".)
>
>For a font subset, the PostScript name of the font, that is, the value of the font’s BaseFont entry and the font descriptor’s FontName entry, shall begin with a tag followed by a plus sign (+) followed by the PostScript name of the font from which the subset was created. The tag shall consist of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets of the same font in the same PDF file shall have different tags. The glyph name .notdef shall be defined in the font subset.
>
>NOTE It is recommended that PDF processors treat multiple subset fonts as completely independent entities, even if they appear to have been created from the same original font.
>
>EXAMPLE EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font.

(ISO 32000-2)

Thus, AAAAAD+SourceHanSansCN-Normal and BIISMY+SourceHanSansCN-Normal most likely are different subsets of the same source font.

huangapple
  • 本文由 发表于 2023年2月14日 19:07:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75446930.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定