在PHP中解析PPTX,需要找到附加图像的坐标。

huangapple go评论57阅读模式
英文:

Parsing PPTX in PHP, Need to find coordinates to attached images

问题

我正在尝试获取通过PHP上传的PPTX文件中上传的图像的X、Y坐标。由于我们在共享服务器上,所以受到模块使用的限制。

我找不到任何地方可以找到任何信息。我尝试在社区中搜索,但无法找到解决方案。

任何帮助将不胜感激。

function pptx_extract_images() {
    // 获取提交的数据
    $data = filter_input_array(INPUT_POST);
    $attachment_id = $data['attachment_id'];

    // 获取附件文件及其名称
    $input_file = get_attached_file($attachment_id);
    $input_file_name = pathinfo($input_file)["filename"];

    // 获取附件类型
    $attachment_type = pathinfo($input_file, PATHINFO_EXTENSION);

    // 创建pptx的zip压缩包
    $package = new \ZipArchive();

    // 如果不是zip/pptx文件
    if (!isset($package)) {
        return;
    }

    $package->open($input_file);

    // 读取关系并搜索图像
    $relationsXml = $package->getFromName('_rels/.rels');

    if ($relationsXml === false) {
        $logger->write_log('pptx_extract_images', $relationsXml, '无效的归档或损坏的.pptx文件。');
        throw new RuntimeException('无效的归档或损坏的.pptx文件。');
    }

    $relations = simplexml_load_string($relationsXml);

    function absoluteZipPath($path) {
        $path = str_replace(array('/', '\\'), DIRECTORY_SEPARATOR, $path);
        $parts = array_filter(explode(DIRECTORY_SEPARATOR, $path), 'strlen');

        $absolutes = array();

        foreach ($parts as $part) {
            if ('.' == $part) continue;

            if ('..' == $part) {
                array_pop($absolutes);
            } else {
                $absolutes[] = $part;
            }
        }

        return implode('/', $absolutes);
    }

    // 文档数据容器
    $slides = 0;
    $data = array();

    foreach ($relations->Relationship as $rel) {
        if ($rel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument') {

            $slideRelations = simplexml_load_string($package->getFromName(absoluteZipPath(dirname($rel["Target"]) . "/_rels/" . basename($rel["Target"]) . ".rels")));

            foreach ($slideRelations->Relationship as $slideRel) {
                if ($slideRel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide') {

                    $slideNotesRelations = simplexml_load_string($package->getFromName(absoluteZipPath(dirname($rel["Target"]) . "/" . dirname($slideRel["Target"]) . "/_rels/" . basename($slideRel["Target"]) . ".rels")));

                    $slideNo = isset($slideRel["Target"]) ? str_replace('slide', '', pathinfo($slideRel["Target"])["filename"]) : null;

                    foreach ($slideNotesRelations->Relationship as $slideImageRel) {
                        if ($slideImageRel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/image') {

                            $image = basename($slideImageRel["Target"]);
                            $image_mime = explode('.', $image);

                            $count_explode = count($image_mime);
                            $image_mime = strtolower($image_mime[$count_explode - 1]);

                            if ($image_mime == 'gif') {
                                $data['page_' . ($slideNo - 1)]['animated_image'] = $attachment_id  . '-' .  $input_file_name . '/image-' . $image;
                            }
                        }
                    }

                    $slides++;
                }
            }
        }
    }

    $package->close();
}

感谢您查看,非常感谢。

英文:

I am trying to get the X,Y coordinates of images uploaded to the PPTX file uploaded using PHP.
We are on a shared server, so limited by module usage.

There is no place where i could find anything. I tried searching the community, but unable to find a solution.

Any help would be appreciated.

function pptx_extract_images() {
// Getting post data
$data = filter_input_array(INPUT_POST);
$attachment_id = $data['attachment_id'];
// Getting attachment file and  its name
$input_file = get_attached_file($attachment_id);
$input_file_name = pathinfo($input_file)["filename"];
// Getting attachment type
$attachment_type = pathinfo($input_file, PATHINFO_EXTENSION);
// Making a zip archive package for pptx
$package = new \ZipArchive();
//if not a zip/pptx file
if (!isset($package)) {
return;
}
$package->open($input_file);
// Read relations and search for images
$relationsXml = $package->getFromName('_rels/.rels');
if ($relationsXml === false) {
$logger->write_log('pptx_extract_images', $relationsXml, 'Invalid archive or corrupted .pptx file.');
throw new RuntimeException('Invalid archive or corrupted .pptx file.');
}
$relations = simplexml_load_string($relationsXml);
function absoluteZipPath($path) {
$path = str_replace(array('/', '\\'), DIRECTORY_SEPARATOR, $path);
$parts = array_filter(explode(DIRECTORY_SEPARATOR, $path), 'strlen');
$absolutes = array();
foreach ($parts as $part) {
if ('.' == $part) continue;
if ('..' == $part) {
array_pop($absolutes);
} else {
$absolutes[] = $part;
}
}
return implode('/', $absolutes);
}
// Document data holders
$slides = 0;
$data = array();
foreach ($relations->Relationship as $rel) {
if ($rel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument') {
$slideRelations = simplexml_load_string($package->getFromName(absoluteZipPath(dirname($rel["Target"]) . "/_rels/" . basename($rel["Target"]) . ".rels")));
foreach ($slideRelations->Relationship as $slideRel) {
if ($slideRel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide') {
$slideNotesRelations = simplexml_load_string($package->getFromName(absoluteZipPath(dirname($rel["Target"]) . "/" . dirname($slideRel["Target"]) . "/_rels/" . basename($slideRel["Target"]) . ".rels")));
$slideNo = isset($slideRel["Target"]) ? str_replace('slide', '', pathinfo($slideRel["Target"])["filename"]) : null;
foreach ($slideNotesRelations->Relationship as $slideImageRel) {
if ($slideImageRel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/image') {
$image = basename($slideImageRel["Target"]);
$image_mime = explode('.', $image);
$count_explode = count($image_mime);
$image_mime = strtolower($image_mime[$count_explode - 1]);
if ($image_mime == 'gif') {
$data['page_' . ($slideNo - 1)]['animated_image'] = $attachment_id  . '-' .  $input_file_name . '/image-' . $image;
}
}
}
$slides++;
}
}
}
}
$package->close();
}

Thanks for looking into it, much appreciated.

答案1

得分: 1

这可以通过使用PHPPresentation库来实现。我注意到您提到了使用模块的困难,但在这里不应该成为问题,因为您可以下载发布版本并require它们(无需使用composer)。虽然PHPPresentation文档有点不足,但它们的示例非常好,可以帮助您理解事物的运作方式。

以下是pptx的外观:

在PHP中解析PPTX,需要找到附加图像的坐标。

代码:

<?php

/* 

问题作者:Ali Hussain
问题回答者:Jacob Mulquin
问题:在PHP中解析PPTX,需要找到附加图像的坐标
URL:https://stackoverflow.com/questions/76454701/parsing-pptx-in-php-need-to-find-coordinates-to-attached-images
标签:php, google-slides

*/

require_once 'PHPPresentation-1.0.0/src/PhpPresentation/Autoloader.php';
\PhpOffice\PhpPresentation\Autoloader::register();
require_once 'Common-1.0.1/src/Common/Autoloader.php';
\PhpOffice\Common\Autoloader::register();

$file = 'images.pptx';

$pptReader = PhpOffice\PhpPresentation\IOFactory::createReader('PowerPoint2007');
$oPHPPresentation = $pptReader->load($file);

function getShapeDetails($shape, $slide_number)
{
    $width = $shape->getWidth();
    $height = $shape->getHeight();

    if ($width === 0 || $height === 0) {
        return [];
    }

    $name = '';
    $description = '';

    if ($shape instanceof PhpOffice\PhpPresentation\Shape\Drawing\Gd) {
        $name = $shape->getName();
        $description = $shape->getDescription();
    }

    return [
        'slide_number' => $slide_number+1,
        'hashcode' => $shape->getHashCode(),
        'offsetX' => $shape->getOffsetX(),
        'offsetY' => $shape->getOffsetY(),
        'width' => $width,
        'height' => $height,
        'name' => $name,
        'description' => $description
    ];
}

$images = [];
foreach ($oPHPPresentation->getAllSlides() as $slide_number => $oSlide) {
    foreach ($oSlide->getShapeCollection() as $oShape) {
        if ($oShape instanceof PhpOffice\PhpPresentation\Shape\Group) {
            foreach ($oShape->getShapeCollection() as $oShapeChild) {
                $images[] = getShapeDetails($oShapeChild, $slide_number);
            }
        } else {
            $images[] = getShapeDetails($oShape, $slide_number);
        }
    }
}

// 删除可能不是图像的形状
$images = array_values(array_filter($images));

var_dump($images);

产生的结果:

array(4) {
  [0]=>
  array(8) {
    ["slide_number"]=>
    int(1)
    ["hashcode"]=>
    string(32) "e2b4ed359604645d2e483f037ef81b55"
    ["offsetX"]=>
    int(53)
    ["offsetY"]=>
    int(19)
    ["width"]=>
    int(468)
    ["height"]=>
    int(236)
    ["name"]=>
    string(9) "Picture 4"
    ["description"]=>
    string(89) "A picture containing smile, yellow, smiley, emoticon

Description automatically generated"
  }
  [1]=>
  array(8) {
    ["slide_number"]=>
    int(1)
    ["hashcode"]=>
    string(32) "6ea656b2a8426a6d6f2393391056842f"
    ["offsetX"]=>
    int(925)
    ["offsetY"]=>
    int(148)
    ["width"]=>
    int(225)
    ["height"]=>
    int(225)
    ["name"]=>
    string(9) "Picture 6"
    ["description"]=>
    string(83) "A picture containing design, font, logo, white

Description automatically generated"
  }
  [2]=>
  array(8) {
    ["slide_number"]=>
    int(2)
    ["hashcode"]=>
    string(32) "3ddc522c60417b61b8a5f316b0f29dc2"
    ["offsetX"]=>
    int(363)
    ["offsetY"]=>
    int(235)
    ["width"]=>
    int(402)
    ["height"]=>
    int(269)
    ["name"]=>
    string(21) "Content Placeholder 6"
    ["description"]=>
    string(102) "A group of men playing baseball in a field

Description automatically generated with medium confidence"
  }
  [3]=>
  array(8) {
    ["slide_number"]=>
    int(3)
    ["hashcode"]=>
    string(32) "cbefccdfc4db5e355f96bdc8d294296e"
    ["offsetX"]=>
    int(365)
    ["offsetY"]=>
    int(245)
    ["width"]=>
    int(870)
    ["height"]=>
    int(457)
    ["name"]=>
    string(21) "Content Placeholder 4"
    ["description"]=>
    string(90) "A picture containing text, font, screenshot, graphics

Description automatically generated"
  }
}
英文:

This can be achieved using the PHPPresentation library. I note you mentioned difficulty with having modules, but that should not be an issue here as you can download the releases and require them (no need for composer). While the PHPPresentation documentation is a bit lacking, their examples are really good to help figure out how things work.

Here is what the pptx looks like:

在PHP中解析PPTX,需要找到附加图像的坐标。

The code:

&lt;?php
/*
Question Author: Ali Hussain
Question Answerer: Jacob Mulquin
Question: Parsing PPTX in PHP, Need to find coordinates to attached images
URL: https://stackoverflow.com/questions/76454701/parsing-pptx-in-php-need-to-find-coordinates-to-attached-images
Tags: php, google-slides
*/
require_once &#39;PHPPresentation-1.0.0/src/PhpPresentation/Autoloader.php&#39;;
\PhpOffice\PhpPresentation\Autoloader::register();
require_once &#39;Common-1.0.1/src/Common/Autoloader.php&#39;;
\PhpOffice\Common\Autoloader::register();
$file = &#39;images.pptx&#39;;
$pptReader = PhpOffice\PhpPresentation\IOFactory::createReader(&#39;PowerPoint2007&#39;);
$oPHPPresentation = $pptReader-&gt;load($file);
function getShapeDetails($shape, $slide_number)
{
$width = $shape-&gt;getWidth();
$height = $shape-&gt;getHeight();
if ($width === 0 || $height === 0) {
return [];
}
$name = &#39;&#39;;
$description = &#39;&#39;;
if ($shape instanceof PhpOffice\PhpPresentation\Shape\Drawing\Gd) {
$name = $shape-&gt;getName();
$description = $shape-&gt;getDescription();
}
return [
&#39;slide_number&#39; =&gt; $slide_number+1,
&#39;hashcode&#39; =&gt; $shape-&gt;getHashCode(),
&#39;offsetX&#39; =&gt; $shape-&gt;getOffsetX(),
&#39;offsetY&#39; =&gt; $shape-&gt;getOffsetY(),
&#39;width&#39; =&gt; $width,
&#39;height&#39; =&gt; $height,
&#39;name&#39; =&gt; $name,
&#39;description&#39; =&gt; $description
];
}
$images = [];
foreach ($oPHPPresentation-&gt;getAllSlides() as $slide_number =&gt; $oSlide) {
foreach ($oSlide-&gt;getShapeCollection() as $oShape) {
if ($oShape instanceof PhpOffice\PhpPresentation\Shape\Group) {
foreach ($oShape-&gt;getShapeCollection() as $oShapeChild) {
$images[] = getShapeDetails($oShapeChild, $slide_number);
}
} else {
$images[] = getShapeDetails($oShape, $slide_number);
}
}
}
// remove shapes that probably aren&#39;t images
$images = array_values(array_filter($images));
var_dump($images);

Yields:

array(4) {
[0]=&gt;
array(8) {
[&quot;slide_number&quot;]=&gt;
int(1)
[&quot;hashcode&quot;]=&gt;
string(32) &quot;e2b4ed359604645d2e483f037ef81b55&quot;
[&quot;offsetX&quot;]=&gt;
int(53)
[&quot;offsetY&quot;]=&gt;
int(19)
[&quot;width&quot;]=&gt;
int(468)
[&quot;height&quot;]=&gt;
int(236)
[&quot;name&quot;]=&gt;
string(9) &quot;Picture 4&quot;
[&quot;description&quot;]=&gt;
string(89) &quot;A picture containing smile, yellow, smiley, emoticon
Description automatically generated&quot;
}
[1]=&gt;
array(8) {
[&quot;slide_number&quot;]=&gt;
int(1)
[&quot;hashcode&quot;]=&gt;
string(32) &quot;6ea656b2a8426a6d6f2393391056842f&quot;
[&quot;offsetX&quot;]=&gt;
int(925)
[&quot;offsetY&quot;]=&gt;
int(148)
[&quot;width&quot;]=&gt;
int(225)
[&quot;height&quot;]=&gt;
int(225)
[&quot;name&quot;]=&gt;
string(9) &quot;Picture 6&quot;
[&quot;description&quot;]=&gt;
string(83) &quot;A picture containing design, font, logo, white
Description automatically generated&quot;
}
[2]=&gt;
array(8) {
[&quot;slide_number&quot;]=&gt;
int(2)
[&quot;hashcode&quot;]=&gt;
string(32) &quot;3ddc522c60417b61b8a5f316b0f29dc2&quot;
[&quot;offsetX&quot;]=&gt;
int(363)
[&quot;offsetY&quot;]=&gt;
int(235)
[&quot;width&quot;]=&gt;
int(402)
[&quot;height&quot;]=&gt;
int(269)
[&quot;name&quot;]=&gt;
string(21) &quot;Content Placeholder 6&quot;
[&quot;description&quot;]=&gt;
string(102) &quot;A group of men playing baseball in a field
Description automatically generated with medium confidence&quot;
}
[3]=&gt;
array(8) {
[&quot;slide_number&quot;]=&gt;
int(3)
[&quot;hashcode&quot;]=&gt;
string(32) &quot;cbefccdfc4db5e355f96bdc8d294296e&quot;
[&quot;offsetX&quot;]=&gt;
int(365)
[&quot;offsetY&quot;]=&gt;
int(245)
[&quot;width&quot;]=&gt;
int(870)
[&quot;height&quot;]=&gt;
int(457)
[&quot;name&quot;]=&gt;
string(21) &quot;Content Placeholder 4&quot;
[&quot;description&quot;]=&gt;
string(90) &quot;A picture containing text, font, screenshot, graphics
Description automatically generated&quot;
}
}

huangapple
  • 本文由 发表于 2023年6月12日 16:12:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76454701.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定