英文:
Getting ###### instead of Emojis When Generating PDF using Apache FOP 2.4
问题
我理解你的请求,以下是代码部分的翻译:
- 
我正在Web应用程序中生成PDF,该应用程序使用Apache FOP 2.4。在生成带有表情符号的PDF时,PDF中出现了"###############"。但是正常的字母数字文本显示正常。
 - 
下面是我的fop.xconf文件中的渲染器配置:
 
<renderers>
	<renderer mime="application/pdf">
		<filterList>
			<value>flate</value>
		</filterList>
		<fonts>
			<font embed-url="ARIAL.TTF">
				<font-triplet name="Arial" style="normal" weight="normal"/>
			</font>
			<!-- 其他字体配置... -->
			<font embed-url="NotoColorEmoji-Regular.ttf">
				<font-triplet name="Noto Color Emoji" style="normal" weight="normal"/>
			</font>
		</fonts>
	</renderer>
</renderers>
- 
我已经将"NotoColorEmoji-Regular.ttf"添加到支持表情符号。但是我仍然无法正确显示PDF中的表情符号。
 - 
我将所有字体文件添加到与fop.xconf文件相同的级别。如何解决这个表情符号问题?
 - 
我在2023年11月4日更新了问题,一旦我将"Noto Color Emoji"添加到XSL文件的font-family属性中,"###############"字符停止显示,而现在显示为空白字符。
 - 
为了进一步测试这个问题,我创建了一个使用Apache FOP生成PDF文件的示例应用程序。你可以在GitHub项目中找到项目。
 
以上是代码部分的翻译,不包括问题或附加信息。
英文:
I'm generating PDF in a web application which is using Apache FOP 2.4. When generating PDF with Emojis it is resulting in the PDF as ###############. But normal alphanumeric text is appearing properly.
Following is how my renderers looks like in fop.xconf file.
<renderers>
	<renderer mime="application/pdf">
		<filterList>
			<value>flate</value>
		</filterList>
		<fonts>
			<font embed-url="ARIAL.TTF">
				<font-triplet name="Arial" style="normal" weight="normal"/>
			</font>
			<font embed-url="ARIALBD.TTF">
				<font-triplet name="Arial" style="normal" weight="bold"/>
			</font>
			<font embed-url="ARIALI.TTF">
				<font-triplet name="Arial" style="italic" weight="normal"/>
			</font>
			<font embed-url="ARIALBI.TTF">
				<font-triplet name="Arial" style="italic" weight="bold"/>
			</font>
			<font embed-url="ARIALUNI.TTF">
				<font-triplet name="Arial Unicode MS" style="normal" weight="normal"/>
			</font>
			<font embed-url="NotoColorEmoji-Regular.ttf">
				<font-triplet name="Noto Color Emoji" style="normal" weight="normal"/>
			</font>
			<font embed-url="MSGOTHIC.TTC" metrics-url="msgothic.xml">
				<font-triplet name="Gothic" style="normal" weight="normal"/>
				<font-triplet name="Gothic" style="normal" weight="bold"/>
				<font-triplet name="Gothic" style="italic" weight="normal"/>
				<font-triplet name="Gothic" style="italic" weight="bold"/>
			</font>
		</fonts>
	</renderer>
</renderers>
I added the NotoColorEmoji-Regular.ttf to support emojis. But I'm still not getting the emojis in the PDF properly.
I have added all the font files at the same level as the fop.xconf file.
How can I resolve this emoji issue?
[Update as at 11/04/2023]
Once I added the "Noto Color Emoji" into the font-family attribute in the XSL file, it stopped showing ####### symbols. Instead now it's giving empty white symbols.
It order to test this further I created a sample application to generate PDF files using Apache FOP. You can find the project here <https://github.com/sachindragh/pdf-testing>.
Following are the java code, xsl file, input.xml and the fop.xonf file contents in it.
public static void main(String[] args) throws SAXException, TransformerException, IOException {
    FopFactory fopFactory = FopFactory.newInstance(new File("./fop.xconf"));
    OutputStream out = new BufferedOutputStream(new FileOutputStream(new File("fop2_8output.pdf")));
    try {
        Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer transformer = factory.newTransformer(new StreamSource(new File("stylesheet.xsl")));
        Source src = new StreamSource(new File("input.xml"));
        Result res = new SAXResult(fop.getDefaultHandler());
        transformer.transform(src, res);
        System.out.println("PDF file generated successfully.");
    } finally {
        out.close();
    }
}
stylesheet.xsl
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="/">
        <!-- Vladimir Script    VLADIMIR.TTF -->
        <!-- Viner Hand ITC     VINERITC.TTF -->
        <!-- Noto Color Emoji   NotoColorEmoji-Regular.ttf -->
        <fo:root font-family="Noto Color Emoji,Viner Hand ITC">
            <fo:layout-master-set>
                <fo:simple-page-master master-name="A4-portrait"
                                       page-height="29.7cm" page-width="21.0cm" margin="2cm">
                    <fo:region-body/>
                </fo:simple-page-master>
            </fo:layout-master-set>
            <fo:page-sequence master-reference="A4-portrait">
                <fo:flow flow-name="xsl-region-body">
                    <fo:block font-family="Noto Color Emoji,Viner Hand ITC">
                        <xsl:value-of select="value"/>
                    </fo:block>
                </fo:flow>
            </fo:page-sequence>
        </fo:root>
    </xsl:template>
</xsl:stylesheet>
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<value>
    This is a sample text Emojis
    BMP Glyphs  &#x2663; &#x2705;
    Non-BMP Glyphs  &#x1F410; &#x1F600;
</value>
fop.xconf
<?xml version="1.0"?>
<fop version="1.0">
  <renderers>
    <renderer mime="application/pdf">
      <filterList>
        <value>flate</value>
      </filterList>
      <fonts>
        <font embed-url="NotoColorEmoji-Regular.ttf">
            <font-triplet name="Noto Color Emoji" style="normal" weight="normal"/>
        </font>
        <font embed-url="VINERITC.TTF">
          <font-triplet name="Viner Hand ITC" style="normal" weight="normal"/>
        </font>
        <!--<font embed-url="VLADIMIR.TTF">
          <font-triplet name="Vladimir Script" style="normal" weight="normal"/>
        </font>-->
      </fonts>
      <version>1.7</version>
    </renderer>
  </renderers>
</fop>
I ran the above example for Apache FOP library version 2.4 and 2.8 with PDF version set to 1.4 and 1.7 in each library. In the input.xml I added both BMP glyphs(U+2663, U+2705) and non-BMP glyphs(U+1F410, U+1F600). But I get the same result with empty white symbols for both BMP glyphs and non-BMP glyphs. Following is what pdf output looks like.
When I do a select all I can see some characters getting selected in those empty spaces.
And when I copy and paste the selected text into notepad some of the characters appear as follows.
I also looked into the FOP issue <https://issues.apache.org/jira/browse/FOP-1969> mentioned by @Kevin Brown. It seems a fix for the issue is already merged in 2.3 release.
Can anyone help me to figure this issue or any options that I can try to solve this?
答案1
得分: 2
I do not believe that would be supported in FOP. Most all the characters in that font are in the Supplementary Multilingual Plane. From FOP documentation:
"Support for Unicode characters outside of the Base Multilingual Plane (BMP), i.e., characters whose code points are greater than 65535, is not yet implemented. See issue FOP-1969."
For instance:
Unicode Character “🐐” (U+1F410) = Unicode 1F410 = 128016
If you installed that font in a Windows machine and use charmap to view it, you will find only maybe 35 characters. Searching for one shows no character.
英文:
I do not believe that would be supported in FOP. Most all the characters in that font are in the Supplementary Multilingual Plane. From FOP documentation:
"Support for Unicode characters outside of the Base Multilingual Plane (BMP), i.e., characters whose code points are greater than 65535, is not yet implemented. See issue FOP-1969."
For instance:
Unicode Character “🐐” (U+1F410) = Unicode 1F410 = 128016
If you installed that font in a Windows machine and use charmap to view it, you will find only maybe 35 characters. Searching for one shows no character:
If you tried to view the entire font, you only see these:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。






评论