如何将docx转换为xhtml

huangapple go评论73阅读模式
英文:

How to convert docx to xhtml

问题

我正在尝试找到将docx文件转换为XHTML的解决方案。

我发现了xdocreport,看起来不错,但我遇到了一些问题。(而且我对xdocreport不熟悉)

根据他们在GitHub上的文档这里这里:我应该能够使用以下代码进行转换:

String source = args[0];
String dest = args[1];

// 1) 创建 DOCX 到 XHTML 的选项以从注册表中选择合适的转换器
Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);

// 2) 从注册表中获取转换器
IConverter converter = ConverterRegistry.getRegistry().getConverter(options);

// 3) 将 DOCX 转换为 (x)html
try {
    InputStream in = new FileInputStream(new File(source));
    OutputStream out = new FileOutputStream(new File(dest));
    converter.convert(in, out, options);
} catch (XDocConverterException | FileNotFoundException e) {
    e.printStackTrace();
}

我正在使用以下依赖项(尝试过不同的版本,如2.0.2、2.0.0、1.0.6):

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
    <version>2.0.2</version>
</dependency>

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.template.freemarker</artifactId>
    <version>2.0.2</version>
</dependency>

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
    <version>2.0.2</version>
</dependency>

我的问题:

  • 图像丢失了
  • 背景颜色丢失了(所有页面都有背景颜色,而且不是白色,我也需要转换这个)

我如何处理这些问题?
(或者我如何使用Docx4j将docx转换为带有格式、编号和图像的xhtml?)

英文:

I am trying to find a solution to convert a docx file to XHTML.

I found xdocreport, which looks good, but I have some issues. (and I am new to xdocreport)

According to their documentations on github here and here: I should be able to convert with this code:

    String source = args[0];
    String dest = args[1];

    // 1) Create options DOCX to XHTML to select well converter form the registry
    Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);

    // 2) Get the converter from the registry
    IConverter converter = ConverterRegistry.getRegistry().getConverter(options);

    // 3) Convert DOCX to (x)html
    try {
        InputStream in = new FileInputStream(new File(source));
        OutputStream out = new FileOutputStream(new File(dest));
        converter.convert(in, out, options);
    } catch (XDocConverterException | FileNotFoundException e) {
        e.printStackTrace();
    }

I am using these dependencies (tried different versions, like 2.0.2, 2.0.0, 1.0.6):

    &lt;dependency&gt;
        &lt;groupId&gt;fr.opensagres.xdocreport&lt;/groupId&gt;
        &lt;artifactId&gt;fr.opensagres.xdocreport.document.docx&lt;/artifactId&gt;
        &lt;version&gt;2.0.2&lt;/version&gt;
    &lt;/dependency&gt;

    &lt;dependency&gt;
        &lt;groupId&gt;fr.opensagres.xdocreport&lt;/groupId&gt;
        &lt;artifactId&gt;fr.opensagres.xdocreport.template.freemarker&lt;/artifactId&gt;
        &lt;version&gt;2.0.2&lt;/version&gt;
    &lt;/dependency&gt;

    &lt;dependency&gt;
        &lt;groupId&gt;fr.opensagres.xdocreport&lt;/groupId&gt;
        &lt;artifactId&gt;fr.opensagres.xdocreport.converter.docx.xwpf&lt;/artifactId&gt;
        &lt;version&gt;2.0.2&lt;/version&gt;
    &lt;/dependency&gt;

My issues:

  • The images are missing
  • The background color is missing (all pages have a background color, which is not white and I have to convert this too)

How can I handle these issues?
(Or how can I convert docx to xhtml using Docx4j with formats/numbering/images?)

答案1

得分: 2

import java.io.*;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.core.ImageManager;
import org.apache.poi.xwpf.usermodel.*;

public class DOCXToXHTMLXDocReport {

    public static void main(String[] args) throws Exception {

        String docPath = "./WordDocument.docx";

        String root = "./";
        String htmlPath = root + "WordDocument.html";

        XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));

        XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), "images"));

        FileOutputStream out = new FileOutputStream(htmlPath);
        XHTMLConverter.getInstance().convert(document, out, options);

        out.close();		
        document.close();		
    }
}

This handles images properly.

But XDocReport is unable handling page background colors of XWPFDocument properly until now. It extracts and handles paragraph background colors but not page background colors.


<details>
<summary>英文:</summary>

To convert `*.docx` to `XHTML` using `XDocReport` and `apache poi`&#39;s `XWPFDocument` as the source you will need `XHTMLOptions`. Those options are able having `ImageManager` to set the path for extracted images from `XWPFDocument`. Then `XHTMLConverter` is needed to convert.

Complete example:

    import java.io.*;
    
    //needed jars: xdocreport-2.0.2.jar, 
    import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
    import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
    import fr.opensagres.poi.xwpf.converter.core.ImageManager;
    
    //needed jars: all apache poi dependencies
    import org.apache.poi.xwpf.usermodel.*;
    
    public class DOCXToXHTMLXDocReport {
    
     public static void main(String[] args) throws Exception {
    
      String docPath = &quot;./WordDocument.docx&quot;;
    
      String root = &quot;./&quot;;
      String htmlPath = root + &quot;WordDocument.html&quot;;
    
      XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));
    
      XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), &quot;images&quot;));
    
      FileOutputStream out = new FileOutputStream(htmlPath);
      XHTMLConverter.getInstance().convert(document, out, options);
    
      out.close();		
      document.close();		
     
     }
    }

This handles images properly.

But `XDocReport` is unable handling page background colors of `XWPFDocument` properly until now. It extracts and handles paragraph background colors but not page background colors.


</details>



huangapple
  • 本文由 发表于 2020年9月11日 16:06:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/63843154.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定