英文:
Finding URL from <a> using PHP
问题
需要使用PHP查找HTML <a>
标签的“href”部分中编码的每个URL出现。
结果是,我想要获取每个URL的数组。尝试了一点这个,但它只找到了“href=”的起始部分。我知道我的代码非常基础,但我不知道如何改进或更改它以使其工作。感谢所有的帮助。
<?php
$array = [];
$string = file_get_contents("file.html");
$begin = 0;
$end = 0;
do {
$begin = strpos($string, "<a href=\"", $end + 1);
$end = strpos($string, "\"", $begin + 9);
$array[] = substr($string, ($begin + 9), ($end - $begin - 9));
} while ($begin !== false && $end !== false);
英文:
I need to find every occurrence of URL coded in "href" part of html <a> tag using PHP.
As result, I want to get array of every url. Tried a little of this, but it finds only "href=" starting thing. I know that my code is very basic, but I don't know how to improve or change this, to make it works. Thanks for all help.
<?php
$array = [];
$string = file_get_contents("file.html");
$begin = 0;
$end = 0;
do {
$begin = strpos($string, "<a href=\"", $end + 1);
$end = strpos($string, "\"", $begin + 6);
$array[] = substr($string, ($begin + 6), ($end - $begin - 6));
} while ($begin !== false && $end !== false);
答案1
得分: 1
请使用DOMDocument来完成这个任务,而不是使用正则表达式!
$html = file_get_contents('file.html');
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//a');
$links = [];
foreach ($tags as $tag) {
$links[] = $tag->getAttribute('href');
}
英文:
Use DOMDocument for that, not Regex!
$html = file_get_contents('file.html');
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//a');
$links = [];
foreach ($tags as $tag) {
$links[] = $tag->getAttribute('href');
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论