英文:
How to extract linked images out of a html page with PHP/regexp
问题
我正在寻找一些PHP代码或正则表达式(我对正则表达式不太熟悉),以从HTML文件中提取链接的图像。换句话说,只提取看起来像这样的HTML片段:
<a href=...><img src=...></a>
我知道如何分别提取图像和链接:
$links = $dom->getElementsByTagName('a');
$images = $dom->getElementsByTagName('img');
但不知道如何提取两个标签一个嵌套在另一个内部。我也没有在谷歌上找到任何有用的信息。所以我想知道我想做的事情是否不常见或者非常困难?
你能帮我吗?谢谢。
英文:
I'm looking for some PHP code or a rexeg expression (i'm not that skilled about regexp) to extract from a html file just the linked images. In other words, just the chunk of html that looks like:
<a href=...><img src=...></a>
I know how to extract images and links separately
$links = $dom->getElementsByTagName('a');
$images = $dom->getElementsByTagName('img');
but not how to extract the two tags one inside the other. I have also not found anything by googling it. So is it maybe uncommon or very difficult what I want to do?
Could you help me? Thanks.
答案1
得分: 1
你可以使用以下的XPath查询:
//a[./img]
这意味着任何<a>
元素其直接子元素是<img>
。
在使用PHP的DOM API时,代码如下:
$domDocument = new \DOMDocument();
$domDocument->loadHTML($html);
$xpath = new DOMXPath($domDocument);
$imageLinks = $xpath->query('//a[./img]');
如果图片在DOM树中进一步下层,你可以将XPath查询更改为:
//a[.//img]
英文:
You could use the following XPath query:
//a[./img]
which means any <a>
element which has a <img>
as its direct child.
Using PHP's DOM API, this would look like this:
$domDocument = new \DOMDocument();
$domDocument->loadHTML($html);
$xpath = new DOMXPath($domDocument);
$imageLinks = $xpath->query('//a[./img]');
Demo: https://3v4l.org/GXAbC
If the image can be further down the DOM tree, you can change the XPath query to this:
//a[.//img]
答案2
得分: 0
解决方案 不使用 xpath
可以是:
$links = $domDocument->getElementsByTagName('a');
foreach ($links as $link) {
$img = $link->getElementsByTagName('img');
// 获取 DOMNodeList 的第一个元素
print_r($img->item(0));
}
英文:
Solution without xpath
can be:
$links = $domDocument->getElementsByTagName('a');
foreach ($links as $link) {
$img = $link->getElementsByTagName('img');
// getting first element of DOMNodeList
print_r($img->item(0));
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论