使用PHP查找<a>标签中的URL。

huangapple go评论61阅读模式
英文:

Finding URL from <a> using PHP

问题

需要使用PHP查找HTML <a> 标签的“href”部分中编码的每个URL出现。

结果是,我想要获取每个URL的数组。尝试了一点这个,但它只找到了“href=”的起始部分。我知道我的代码非常基础,但我不知道如何改进或更改它以使其工作。感谢所有的帮助。

<?php

$array = [];
$string = file_get_contents("file.html");
$begin = 0;
$end = 0;

do {
    $begin = strpos($string, "<a href=\"", $end + 1);
    $end = strpos($string, "\"", $begin + 9);
    $array[] = substr($string, ($begin + 9), ($end - $begin - 9));
} while ($begin !== false && $end !== false);
英文:

I need to find every occurrence of URL coded in "href" part of html <a> tag using PHP.

As result, I want to get array of every url. Tried a little of this, but it finds only "href=" starting thing. I know that my code is very basic, but I don't know how to improve or change this, to make it works. Thanks for all help.

&lt;?php

$array = [];  
$string = file_get_contents(&quot;file.html&quot;);  
$begin = 0;  
$end = 0;

do {  
    $begin = strpos($string, &quot;&lt;a href=\&quot;&quot;, $end + 1);  
    $end = strpos($string, &quot;\&quot;&quot;, $begin + 6);  
    $array[] = substr($string, ($begin + 6), ($end - $begin - 6));
} while ($begin !== false &amp;&amp; $end !== false);

答案1

得分: 1

请使用DOMDocument来完成这个任务,而不是使用正则表达式!

$html = file_get_contents('file.html');

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$tags = $xpath->query('//a');
$links = [];

foreach ($tags as $tag) {
    $links[] = $tag->getAttribute('href');
}

示例

英文:

Use DOMDocument for that, not Regex!

$html = file_get_contents(&#39;file.html&#39;);

$dom = new DOMDocument();
$dom-&gt;loadHTML($html);

$xpath = new DOMXPath($dom);

$tags = $xpath-&gt;query(&#39;//a&#39;);
$links = [];

foreach ($tags as $tag) {
    $links[] = $tag-&gt;getAttribute(&#39;href&#39;);
}

Example

huangapple
  • 本文由 发表于 2023年2月8日 20:41:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385949.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定