Error when parsing an embedded .xlsx file from a .ppt using apache-poi. The supplied POIFSFileSystem does not contain a BIFF8 'Workbook' entry

huangapple go评论72阅读模式
英文:

Error when parsing an embedded .xlsx file from a .ppt using apache-poi. The supplied POIFSFileSystem does not contain a BIFF8 'Workbook' entry

问题

我在使用Apache POI从.ppt文件中提取嵌入的.xlsx文件时遇到了问题。如果有人能帮助我,那将非常棒。

问题的主题:

试图解决的问题:从“.ppt”文件中提取嵌入的“.xlsx”文件。

我目前正在使用apache-poi。

似乎当我尝试使用hslfSlideShow.getEmbeddedObjects()来进行操作时,我可以很好地获取xlsx对象,但是当我尝试使用WorkbookFactory.create(inputStream)将其转换为XLSFWorkbook对象时,它抛出了错误,错误消息如下:

    java.lang.IllegalArgumentException: 所提供的POIFSFileSystem不包含BIFF8 'Workbook'条目。它真的是一个excel文件吗?拥有:[OlePres000, Ole, CompObj, Package]
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.getWorkbookDirEntryName(HSSFWorkbook.java:286)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:326)
	at org.apache.poi.hssf.usermodel.HSSFWorkbookFactory.createWorkbook(HSSFWorkbookFactory.java:64)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:167)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:112)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:253)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)

有趣的是,尽管是xlsx文件,它却在调用HSSFWorkbookFactory。

并且**不,xlsx文件没有损坏/受密码保护**。我可以正常打开它。

而且,如果我尝试解析**未**嵌入在.ppt中的.xlsx文件,它也能正常工作。

而且,在将其嵌入到.pptx文件中并调用xmlSlideShow.getAllEmbeddedParts()之类的方法从.pptx中获取嵌入的对象时,解析也能正常工作。
英文:

I am facing an issue when using apache poi to extract an embedded .xlsx files from a .ppt file. It would be really great if somebody could help me out.

The subject of the problem:

Problem trying to solve: Extracting a ".xlsx" file embedded inside a ".ppt".

I am currently using apache-poi.

It seems that when I try to do it using hslfSlideShow.getEmbeddedObjects(), I get the xlsx object just fine but when I try converting it to the XLSFWorkbook object using say WorkbookFactory.create(inputStream), it threw an error saying

java.lang.IllegalArgumentException: The supplied POIFSFileSystem does not contain a BIFF8 &#39;Workbook&#39; entry. Is it really an excel file? Had: [OlePres000, Ole, CompObj, Package]
at org.apache.poi.hssf.usermodel.HSSFWorkbook.getWorkbookDirEntryName(HSSFWorkbook.java:286)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.&lt;init&gt;(HSSFWorkbook.java:326)
at org.apache.poi.hssf.usermodel.HSSFWorkbookFactory.createWorkbook(HSSFWorkbookFactory.java:64)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:167)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:112)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:253)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)

Interestingly it is calling HSSFWorkbookFactory even though its an xlsx file.

And no the xlsx file is not corrupted/password-protected. I can open it just fine.

Also, it works fine if I try parsing the .xlsx file without embedding it in the .ppt.

And the parsing works fine when I embed it in a .pptx file and call methods such as xmlSlideShow.getAllEmbeddedParts() to get the embedded objects from .pptx.

答案1

得分: 1

促进一些评论和调查以形成答案...

这是在较旧版本的Apache POI中存在的限制,但在今年7月的r1880164中得到了修复。

出于向后兼容的原因,PowerPoint通常(但并不总是...)会将嵌入的OOXML资源写入一个中间的OLE2层。这样做的好处是,期望嵌入式办公文档类似于xls / doc的工具/程序可以处理,但代价是增加了另一层包装。

较新版本的Apache POI(5.0应该是首个带有修复的发布版本)在WorkbookFactory中具有接收这种OLE2包装的支持,可以提取出底层的xlsx流并将其传递给XSSFWorkbook。(较旧版本对基于OLE2的受密码保护的xlsx文件执行此操作,但不适用于其未加密的同类文件)

如果您目前使用受影响的POI版本,您需要的代码可能类似于以下内容(主要是从验证支持的单元测试中提取的!):

POIFSFileSystem fs = new POIFSFileSystem(data.getInputStream());
if (fs.getRoot().hasEntry("Package")) {
     DocumentInputStream dis = new DocumentInputStream((DocumentEntry)fs.getRoot().getEntry("Package"));
     try (OPCPackage pkg = OPCPackage.open(dis)) {
            XSSFWorkbook wb = new XSSFWorkbook(pkg);
            handleWorkbook(wb);
            wb.close();
     }
} else {
     try (HSSFWorkbook wb = new HSSFWorkbook(fs)) {
            handleWorkbook(wb);
     }
}
英文:

Promoting some comments and investigation to an answer...

This was a limitation in older version of Apache POI, but was fixed in July in r1880164.

For backwards-compatibility reasons, PowerPoint will often (but not always...) write embedded OOXML resources wrapped in an intermediate OLE2 layer. This has the advantage that tools/programs which expect embedded office documents to be something like a xls / doc to cope, but at the expense of another layer of wrapping.

Newer versions of Apache POI (5.0 should be the first released one with the fix in) have support in WorkbookFactory for receiving an OLE2 wrapper like this, pulling out the underlying xlsx stream and handing that off to XSSFWorkbook. (Older versions did this for OLE2-based password-protected xlsx files, but not their unencrypted cousins)

For now, if you're stuck on an affected POI version, the code you'll want is something like this (largely taken from the unit test verifying support!):

POIFSFileSystem fs = new POIFSFileSystem(data.getInputStream());
if(fs.getRoot().hasEntry(&quot;Package&quot;)) {
     DocumentInputStream dis = new DocumentInputStream((DocumentEntry)fs.getRoot().getEntry(&quot;Package&quot;));
     try (OPCPackage pkg = OPCPackage.open(dis)) {
            XSSFWorkbook wb = new XSSFWorkbook(pkg);
            handleWorkbook(wb);
            wb.close();
     }
} else {
     try (HSSFWorkbook wb = new HSSFWorkbook(fs)) {
            handleWorkbook(wb);
     }
}

huangapple
  • 本文由 发表于 2020年10月9日 03:20:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/64269294.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定