2020年9月11日 16:06:58go评论107阅读模式

英文:

How to convert docx to xhtml

问题

我正在尝试找到将docx文件转换为XHTML的解决方案。

我发现了xdocreport，看起来不错，但我遇到了一些问题。（而且我对xdocreport不熟悉）

根据他们在GitHub上的文档这里和这里：我应该能够使用以下代码进行转换：

String source = args[0];
String dest = args[1];
// 1) 创建 DOCX 到 XHTML 的选项以从注册表中选择合适的转换器
Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);
// 2) 从注册表中获取转换器
IConverter converter = ConverterRegistry.getRegistry().getConverter(options);
// 3) 将 DOCX 转换为 (x)html
try {
    InputStream in = new FileInputStream(new File(source));
    OutputStream out = new FileOutputStream(new File(dest));
    converter.convert(in, out, options);
} catch (XDocConverterException | FileNotFoundException e) {
    e.printStackTrace();
}

我正在使用以下依赖项（尝试过不同的版本，如2.0.2、2.0.0、1.0.6）：

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.template.freemarker</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
    <version>2.0.2</version>
</dependency>

我的问题：

图像丢失了
背景颜色丢失了（所有页面都有背景颜色，而且不是白色，我也需要转换这个）

我如何处理这些问题？
（或者我如何使用Docx4j将docx转换为带有格式、编号和图像的xhtml？）

英文:

I am trying to find a solution to convert a docx file to XHTML.

I found xdocreport, which looks good, but I have some issues. (and I am new to xdocreport)

According to their documentations on github here and here: I should be able to convert with this code:

    String source = args[0];
    String dest = args[1];
    // 1) Create options DOCX to XHTML to select well converter form the registry
    Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);
    // 2) Get the converter from the registry
    IConverter converter = ConverterRegistry.getRegistry().getConverter(options);
    // 3) Convert DOCX to (x)html
    try {
        InputStream in = new FileInputStream(new File(source));
        OutputStream out = new FileOutputStream(new File(dest));
        converter.convert(in, out, options);
    } catch (XDocConverterException | FileNotFoundException e) {
        e.printStackTrace();
    }

I am using these dependencies (tried different versions, like 2.0.2, 2.0.0, 1.0.6):

    &lt;dependency&gt;
        &lt;groupId&gt;fr.opensagres.xdocreport&lt;/groupId&gt;
        &lt;artifactId&gt;fr.opensagres.xdocreport.document.docx&lt;/artifactId&gt;
        &lt;version&gt;2.0.2&lt;/version&gt;
    &lt;/dependency&gt;
    &lt;dependency&gt;
        &lt;groupId&gt;fr.opensagres.xdocreport&lt;/groupId&gt;
        &lt;artifactId&gt;fr.opensagres.xdocreport.template.freemarker&lt;/artifactId&gt;
        &lt;version&gt;2.0.2&lt;/version&gt;
    &lt;/dependency&gt;
    &lt;dependency&gt;
        &lt;groupId&gt;fr.opensagres.xdocreport&lt;/groupId&gt;
        &lt;artifactId&gt;fr.opensagres.xdocreport.converter.docx.xwpf&lt;/artifactId&gt;
        &lt;version&gt;2.0.2&lt;/version&gt;
    &lt;/dependency&gt;

My issues:

The images are missing
The background color is missing (all pages have a background color, which is not white and I have to convert this too)

How can I handle these issues?
(Or how can I convert docx to xhtml using Docx4j with formats/numbering/images?)

答案1

得分: 2

import java.io.*;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.core.ImageManager;
import org.apache.poi.xwpf.usermodel.*;
public class DOCXToXHTMLXDocReport {
    public static void main(String[] args) throws Exception {
        String docPath = "./WordDocument.docx";
        String root = "./";
        String htmlPath = root + "WordDocument.html";
        XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));
        XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), "images"));
        FileOutputStream out = new FileOutputStream(htmlPath);
        XHTMLConverter.getInstance().convert(document, out, options);
        out.close();		
        document.close();		
    }
}

This handles images properly.

But XDocReport is unable handling page background colors of XWPFDocument properly until now. It extracts and handles paragraph background colors but not page background colors.


<details>
<summary>英文:</summary>
To convert `*.docx` to `XHTML` using `XDocReport` and `apache poi`&#39;s `XWPFDocument` as the source you will need `XHTMLOptions`. Those options are able having `ImageManager` to set the path for extracted images from `XWPFDocument`. Then `XHTMLConverter` is needed to convert.
Complete example:
    import java.io.*;
    
    //needed jars: xdocreport-2.0.2.jar, 
    import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
    import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
    import fr.opensagres.poi.xwpf.converter.core.ImageManager;
    
    //needed jars: all apache poi dependencies
    import org.apache.poi.xwpf.usermodel.*;
    
    public class DOCXToXHTMLXDocReport {
    
     public static void main(String[] args) throws Exception {
    
      String docPath = &quot;./WordDocument.docx&quot;;
    
      String root = &quot;./&quot;;
      String htmlPath = root + &quot;WordDocument.html&quot;;
    
      XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));
    
      XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), &quot;images&quot;));
    
      FileOutputStream out = new FileOutputStream(htmlPath);
      XHTMLConverter.getInstance().convert(document, out, options);
    
      out.close();		
      document.close();		
     
     }
    }
This handles images properly.
But `XDocReport` is unable handling page background colors of `XWPFDocument` properly until now. It extracts and handles paragraph background colors but not page background colors.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将docx转换为xhtml

问题

答案1

构造/初始化顺序？

OpenGL只渲染了纹理在一个四边形上的一个像素？

List JUnit

最后一个序列化的对象将被覆盖。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。