问题

需要使用PHP查找HTML <a> 标签的“href”部分中编码的每个URL出现。

结果是，我想要获取每个URL的数组。尝试了一点这个，但它只找到了“href=”的起始部分。我知道我的代码非常基础，但我不知道如何改进或更改它以使其工作。感谢所有的帮助。

<?php

$array = [];
$string = file_get_contents("file.html");
$begin = 0;
$end = 0;

do {
    $begin = strpos($string, "<a href=\"", $end + 1);
    $end = strpos($string, "\"", $begin + 9);
    $array[] = substr($string, ($begin + 9), ($end - $begin - 9));
} while ($begin !== false && $end !== false);

英文:

I need to find every occurrence of URL coded in "href" part of html <a> tag using PHP.

As result, I want to get array of every url. Tried a little of this, but it finds only "href=" starting thing. I know that my code is very basic, but I don't know how to improve or change this, to make it works. Thanks for all help.

&lt;?php

$array = [];  
$string = file_get_contents(&quot;file.html&quot;);  
$begin = 0;  
$end = 0;

do {  
    $begin = strpos($string, &quot;&lt;a href=\&quot;&quot;, $end + 1);  
    $end = strpos($string, &quot;\&quot;&quot;, $begin + 6);  
    $array[] = substr($string, ($begin + 6), ($end - $begin - 6));
} while ($begin !== false &amp;&amp; $end !== false);

答案1

得分: 1

请使用DOMDocument来完成这个任务，而不是使用正则表达式！

$html = file_get_contents('file.html');

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$tags = $xpath->query('//a');
$links = [];

foreach ($tags as $tag) {
    $links[] = $tag->getAttribute('href');
}

示例

英文:

Use DOMDocument for that, not Regex!

$html = file_get_contents(&#39;file.html&#39;);

$dom = new DOMDocument();
$dom-&gt;loadHTML($html);

$xpath = new DOMXPath($dom);

$tags = $xpath-&gt;query(&#39;//a&#39;);
$links = [];

foreach ($tags as $tag) {
    $links[] = $tag-&gt;getAttribute(&#39;href&#39;);
}

Example

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用PHP查找<a>标签中的URL。

问题

答案1

如何为Vaadin提示添加浅色主题和深色主题？

Flexbox 不适用于图像的背景。

如何定位一个元素，使其在前一个兄弟元素的下方居中显示

如何解决在我的Nuxt.js项目中无法将NuxtLink连接到页面和组件的问题？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论