如何从PPT中使用Apache POI提取表格数据,如Excel表格单元格和表格行数据。

huangapple go评论106阅读模式
英文:

How to extract table data like excel table cells and table row data from PPT in apache poi

问题

我目前正在使用Apache POI解析PPTX文档,并且可以从PPT幻灯片XSLFSlide中提取图像和嵌入的OLE对象,但不知道如何从幻灯片中提取嵌入的XLS表格。我需要帮助尝试解析和读取PPT幻灯片中的嵌入XLS表格。

英文:

I am currently parsing a PPTX document using apache poi and able to extract images, embedded OLE Objects from ppt slide XSLFSlide but dont know how to extract embedded xls tables from the slide. I need help try to parse and read embedded xls tables in a PPT Slide.

答案1

得分: 1

https://poi.apache.org/components/poifs/embeded.html 的POI文档中:

嵌入在PowerPoint中的文件 - PowerPoint通常不会将嵌入的文件存储在OLE2层中。相反,它们存储在主PowerPoint文件的记录中。请参阅HSLF教程以了解如何从演示文稿中检索嵌入的OLE对象。

打开嵌入的文件 - 所有POIDocument类(HSSFWorkbook、HSLFSlideShow、HWPFDocument和HDGFDiagram)都可以从POIFSFileSystem或POIFSFileSystem内的特定目录打开。因此,要打开嵌入的文件,只需找到表示感兴趣的子目录的适当DirectoryNode,然后将其与整个POIFSFileSystem一起传递给构造函数。

如果要提取嵌入文件的文本内容,然后打开适当的POIDocument,然后将其传递给提取器类,而不是仅将POIFSFilesystem传递给提取器。

并从HSLF教程中:

for (HSLFShape shape : slide.getShapes()) {
    if (shape instanceof OLEShape) {
        OLEShape ole = (OLEShape) shape;
        HSLFObjectData data = ole.getObjectData();
        String name = ole.getInstanceName();
        if ("Worksheet".equals(name)) {
            HSSFWorkbook wb = new HSSFWorkbook(data.getData());
        } else if ("Document".equals(name)) {
            HWPFDocument doc = new HWPFDocument(data.getData());
        }
    }
}

注意:您可以使用WorkbookFactory来抽象出底层数据格式(例如HSSFWorkbookXSSFWorkbook之间的区别)。

例如:

if ("Worksheet".equals(name)) {
   Workbook wb = WorkbookFactory.create(data.getInputStream());
   Sheet sheet = workbook.getSheetAt(0);
   for (Iterator<Row> rowIterator = sheet.iterator(); rowIterator.hasNext(); ) {
      ...
   }
}
英文:

From the POI docs at https://poi.apache.org/components/poifs/embeded.html

> Files embedded in PowerPoint - PowerPoint does not normally store embedded files in the OLE2 layer. Instead, they are held within records of the main PowerPoint file. See the HSLF Tutorial for how to retrieve embedded OLE objects from a presentation

> Opening embedded files - All of the POIDocument classes (HSSFWorkbook, HSLFSlideShow, HWPFDocument and HDGFDiagram) can either be opened from a POIFSFileSystem, or from a specific directory within a POIFSFileSystem. So, to open embedded files, simply locate the appropriate DirectoryNode that represents the subdirectory of interest, and pass this + the overall POIFSFileSystem to the constructor.

> I you want to extract the textual contents of the embedded file, then open the appropriate POIDocument, and then pass this to the extractor class, instead of simply passing the POIFSFilesystem to the extractor.

And from the HSLF Tutorial

for (HSLFShape shape : slide.getShapes()) {
    if (shape instanceof OLEShape) {
        OLEShape ole = (OLEShape) shape;
        HSLFObjectData data = ole.getObjectData();
        String name = ole.getInstanceName();
        if (&quot;Worksheet&quot;.equals(name)) {
            HSSFWorkbook wb = new HSSFWorkbook(data.getData());
        } else if (&quot;Document&quot;.equals(name)) {
            HWPFDocument doc = new HWPFDocument(data.getData());
        }
    }
}

Note: You can use WorkbookFactory to abstract yourself from the underlying data format (eg HSSFWorkbook vs XSSFWorkbook).

Eg:

if (&quot;Worksheet&quot;.equals(name)) {
   Workbook wb = WorkbookFactory.create(data.getInputStream());
   Sheet sheet = workbook.getSheetAt(0);
   for (Iterator&lt;Row&gt; rowIterator = sheet.iterator() ; rowIterator.hasNext(); ) {
      ...
   }
}

答案2

得分: 0

try (FileInputStream inStream = new FileInputStream(new File("path/to/file.xlsx"))) {
   XSSFWorkbook workbook = new XSSFWorkbook(inStream);
   XSSFSheet sheet = workbook.getSheetAt(0);

   for (Iterator<Row> rowIterator = sheet.iterator() ; rowIterator.hasNext(); ) {
      Row row = rowIterator.next();
      for (Iterator<Cell> cellIterator = row.cellIterator(); cellIterator.hasNext(); )  {
         Cell cell = cellIterator.next();
         switch (cell.getCellType()) 
         {
            case Cell.CELL_TYPE_NUMERIC:
               System.out.print(cell.getNumericCellValue() + "\t");
               break;
            case Cell.CELL_TYPE_STRING:
               System.out.print(cell.getStringCellValue() + "\t");
               break;
            ...   
         }
      }
      System.out.println("");
   }
}
英文:
try (FileInputStream inStream = new FileInputStream(new File(&quot;path/to/file.xlsx&quot;))) {
   XSSFWorkbook workbook = new XSSFWorkbook(inStream);
   XSSFSheet sheet = workbook.getSheetAt(0);

   for (Iterator&lt;Row&gt; rowIterator = sheet.iterator() ; rowIterator.hasNext(); ) {
      Row row = rowIterator.next();
      for (Iterator&lt;Cell&gt; cellIterator = row.cellIterator(); cellIterator.hasNext(); )  {
         Cell cell = cellIterator.next();
         switch (cell.getCellType()) 
         {
            case Cell.CELL_TYPE_NUMERIC:
               System.out.print(cell.getNumericCellValue() + &quot;\t&quot;);
               break;
            case Cell.CELL_TYPE_STRING:
               System.out.print(cell.getStringCellValue() + &quot;\t&quot;);
               break;
            ...   
         }
      }
      System.out.println(&quot;&quot;);
   }
} 

huangapple
  • 本文由 发表于 2023年7月31日 20:28:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76803642.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定