2023年7月31日 20:28:59go评论137阅读模式

英文:

How to extract table data like excel table cells and table row data from PPT in apache poi

问题

我目前正在使用Apache POI解析PPTX文档，并且可以从PPT幻灯片XSLFSlide中提取图像和嵌入的OLE对象，但不知道如何从幻灯片中提取嵌入的XLS表格。我需要帮助尝试解析和读取PPT幻灯片中的嵌入XLS表格。

英文:

I am currently parsing a PPTX document using apache poi and able to extract images, embedded OLE Objects from ppt slide XSLFSlide but dont know how to extract embedded xls tables from the slide. I need help try to parse and read embedded xls tables in a PPT Slide.

答案1

得分: 1

从 https://poi.apache.org/components/poifs/embeded.html 的POI文档中：

嵌入在PowerPoint中的文件 - PowerPoint通常不会将嵌入的文件存储在OLE2层中。相反，它们存储在主PowerPoint文件的记录中。请参阅HSLF教程以了解如何从演示文稿中检索嵌入的OLE对象。

打开嵌入的文件 - 所有POIDocument类（HSSFWorkbook、HSLFSlideShow、HWPFDocument和HDGFDiagram）都可以从POIFSFileSystem或POIFSFileSystem内的特定目录打开。因此，要打开嵌入的文件，只需找到表示感兴趣的子目录的适当DirectoryNode，然后将其与整个POIFSFileSystem一起传递给构造函数。

如果要提取嵌入文件的文本内容，然后打开适当的POIDocument，然后将其传递给提取器类，而不是仅将POIFSFilesystem传递给提取器。

并从HSLF教程中：

for (HSLFShape shape : slide.getShapes()) {
    if (shape instanceof OLEShape) {
        OLEShape ole = (OLEShape) shape;
        HSLFObjectData data = ole.getObjectData();
        String name = ole.getInstanceName();
        if ("Worksheet".equals(name)) {
            HSSFWorkbook wb = new HSSFWorkbook(data.getData());
        } else if ("Document".equals(name)) {
            HWPFDocument doc = new HWPFDocument(data.getData());
        }
    }
}

注意：您可以使用WorkbookFactory来抽象出底层数据格式（例如HSSFWorkbook与XSSFWorkbook之间的区别）。

例如：

if ("Worksheet".equals(name)) {
   Workbook wb = WorkbookFactory.create(data.getInputStream());
   Sheet sheet = workbook.getSheetAt(0);
   for (Iterator<Row> rowIterator = sheet.iterator(); rowIterator.hasNext(); ) {
      ...
   }
}

英文:

From the POI docs at https://poi.apache.org/components/poifs/embeded.html

> Files embedded in PowerPoint - PowerPoint does not normally store embedded files in the OLE2 layer. Instead, they are held within records of the main PowerPoint file. See the HSLF Tutorial for how to retrieve embedded OLE objects from a presentation

> Opening embedded files - All of the POIDocument classes (HSSFWorkbook, HSLFSlideShow, HWPFDocument and HDGFDiagram) can either be opened from a POIFSFileSystem, or from a specific directory within a POIFSFileSystem. So, to open embedded files, simply locate the appropriate DirectoryNode that represents the subdirectory of interest, and pass this + the overall POIFSFileSystem to the constructor.

> I you want to extract the textual contents of the embedded file, then open the appropriate POIDocument, and then pass this to the extractor class, instead of simply passing the POIFSFilesystem to the extractor.

And from the HSLF Tutorial

for (HSLFShape shape : slide.getShapes()) {
    if (shape instanceof OLEShape) {
        OLEShape ole = (OLEShape) shape;
        HSLFObjectData data = ole.getObjectData();
        String name = ole.getInstanceName();
        if (&quot;Worksheet&quot;.equals(name)) {
            HSSFWorkbook wb = new HSSFWorkbook(data.getData());
        } else if (&quot;Document&quot;.equals(name)) {
            HWPFDocument doc = new HWPFDocument(data.getData());
        }
    }
}

Note: You can use WorkbookFactory to abstract yourself from the underlying data format (eg HSSFWorkbook vs XSSFWorkbook).

Eg:

if (&quot;Worksheet&quot;.equals(name)) {
   Workbook wb = WorkbookFactory.create(data.getInputStream());
   Sheet sheet = workbook.getSheetAt(0);
   for (Iterator&lt;Row&gt; rowIterator = sheet.iterator() ; rowIterator.hasNext(); ) {
      ...
   }
}

答案2

得分: 0

try (FileInputStream inStream = new FileInputStream(new File("path/to/file.xlsx"))) {
   XSSFWorkbook workbook = new XSSFWorkbook(inStream);
   XSSFSheet sheet = workbook.getSheetAt(0);

   for (Iterator<Row> rowIterator = sheet.iterator() ; rowIterator.hasNext(); ) {
      Row row = rowIterator.next();
      for (Iterator<Cell> cellIterator = row.cellIterator(); cellIterator.hasNext(); )  {
         Cell cell = cellIterator.next();
         switch (cell.getCellType()) 
         {
            case Cell.CELL_TYPE_NUMERIC:
               System.out.print(cell.getNumericCellValue() + "\t");
               break;
            case Cell.CELL_TYPE_STRING:
               System.out.print(cell.getStringCellValue() + "\t");
               break;
            ...   
         }
      }
      System.out.println("");
   }
}

英文:

try (FileInputStream inStream = new FileInputStream(new File(&quot;path/to/file.xlsx&quot;))) {
   XSSFWorkbook workbook = new XSSFWorkbook(inStream);
   XSSFSheet sheet = workbook.getSheetAt(0);

   for (Iterator&lt;Row&gt; rowIterator = sheet.iterator() ; rowIterator.hasNext(); ) {
      Row row = rowIterator.next();
      for (Iterator&lt;Cell&gt; cellIterator = row.cellIterator(); cellIterator.hasNext(); )  {
         Cell cell = cellIterator.next();
         switch (cell.getCellType()) 
         {
            case Cell.CELL_TYPE_NUMERIC:
               System.out.print(cell.getNumericCellValue() + &quot;\t&quot;);
               break;
            case Cell.CELL_TYPE_STRING:
               System.out.print(cell.getStringCellValue() + &quot;\t&quot;);
               break;
            ...   
         }
      }
      System.out.println(&quot;&quot;);
   }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从PPT中使用Apache POI提取表格数据，如Excel表格单元格和表格行数据。

问题

答案1

答案2

寻找最长不重复字符子串 – Java

在Java中，可以使用同一个对象来接受整数输入和字符串输入。

Spring-Boot登录无需使用Spring Security

为什么我的 ArrayList 项目在消失时没有自动移除？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论