英文:
How to convert docx to xhtml
问题
我正在尝试找到将docx文件转换为XHTML的解决方案。
我发现了xdocreport,看起来不错,但我遇到了一些问题。(而且我对xdocreport不熟悉)
根据他们在GitHub上的文档这里和这里:我应该能够使用以下代码进行转换:
String source = args[0];
String dest = args[1];
// 1) 创建 DOCX 到 XHTML 的选项以从注册表中选择合适的转换器
Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);
// 2) 从注册表中获取转换器
IConverter converter = ConverterRegistry.getRegistry().getConverter(options);
// 3) 将 DOCX 转换为 (x)html
try {
InputStream in = new FileInputStream(new File(source));
OutputStream out = new FileOutputStream(new File(dest));
converter.convert(in, out, options);
} catch (XDocConverterException | FileNotFoundException e) {
e.printStackTrace();
}
我正在使用以下依赖项(尝试过不同的版本,如2.0.2、2.0.0、1.0.6):
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.template.freemarker</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.2</version>
</dependency>
我的问题:
- 图像丢失了
- 背景颜色丢失了(所有页面都有背景颜色,而且不是白色,我也需要转换这个)
我如何处理这些问题?
(或者我如何使用Docx4j将docx转换为带有格式、编号和图像的xhtml?)
英文:
I am trying to find a solution to convert a docx file to XHTML.
I found xdocreport, which looks good, but I have some issues. (and I am new to xdocreport)
According to their documentations on github here and here: I should be able to convert with this code:
String source = args[0];
String dest = args[1];
// 1) Create options DOCX to XHTML to select well converter form the registry
Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);
// 2) Get the converter from the registry
IConverter converter = ConverterRegistry.getRegistry().getConverter(options);
// 3) Convert DOCX to (x)html
try {
InputStream in = new FileInputStream(new File(source));
OutputStream out = new FileOutputStream(new File(dest));
converter.convert(in, out, options);
} catch (XDocConverterException | FileNotFoundException e) {
e.printStackTrace();
}
I am using these dependencies (tried different versions, like 2.0.2, 2.0.0, 1.0.6):
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.template.freemarker</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.2</version>
</dependency>
My issues:
- The images are missing
- The background color is missing (all pages have a background color, which is not white and I have to convert this too)
How can I handle these issues?
(Or how can I convert docx to xhtml using Docx4j with formats/numbering/images?)
答案1
得分: 2
import java.io.*;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.core.ImageManager;
import org.apache.poi.xwpf.usermodel.*;
public class DOCXToXHTMLXDocReport {
public static void main(String[] args) throws Exception {
String docPath = "./WordDocument.docx";
String root = "./";
String htmlPath = root + "WordDocument.html";
XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));
XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), "images"));
FileOutputStream out = new FileOutputStream(htmlPath);
XHTMLConverter.getInstance().convert(document, out, options);
out.close();
document.close();
}
}
This handles images properly.
But XDocReport
is unable handling page background colors of XWPFDocument
properly until now. It extracts and handles paragraph background colors but not page background colors.
<details>
<summary>英文:</summary>
To convert `*.docx` to `XHTML` using `XDocReport` and `apache poi`'s `XWPFDocument` as the source you will need `XHTMLOptions`. Those options are able having `ImageManager` to set the path for extracted images from `XWPFDocument`. Then `XHTMLConverter` is needed to convert.
Complete example:
import java.io.*;
//needed jars: xdocreport-2.0.2.jar,
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.core.ImageManager;
//needed jars: all apache poi dependencies
import org.apache.poi.xwpf.usermodel.*;
public class DOCXToXHTMLXDocReport {
public static void main(String[] args) throws Exception {
String docPath = "./WordDocument.docx";
String root = "./";
String htmlPath = root + "WordDocument.html";
XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));
XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), "images"));
FileOutputStream out = new FileOutputStream(htmlPath);
XHTMLConverter.getInstance().convert(document, out, options);
out.close();
document.close();
}
}
This handles images properly.
But `XDocReport` is unable handling page background colors of `XWPFDocument` properly until now. It extracts and handles paragraph background colors but not page background colors.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论