英文:
Extract a specific domain links from HTML of a website
问题
// 提取链接的代码
英文:
Below is my code to extract links from a given link and my issue is when we view the source of the given Url there is a link with domain https://fs1.pdisk.pro:183 , but when i extracted links its not coming.
<?php
function extractLinks($url) {
// Get the HTML content of the page.
$html = file_get_contents($url);
// Create a DOMDocument object.
$dom = new DOMDocument();
@$dom->loadHTML($html);
// Get all the anchor elements.
$anchors = $dom->getElementsByTagName('source');
// Create an array to store the links.
$links = array();
// Loop through the anchor elements.
foreach ($anchors as $anchor)
{
// Get the href attribute of the anchor element.
$href = $anchor->getAttribute('src');
// Add the link to the links array.
$links[] = $href;
}
// Return the links array as JSON.
return json_encode($links);
}
// Get the URL of the website to extract links from.
$url = 'http://pdisk.investro1.com/how-to-buy-life-insurance-online-qfevac8cq8x4.html';
// Extract the links from the website.
$links = extractLinks($url);
// Print the links in JSON format.
echo json_encode($links);
Can someone help me to extract the all the needed domain link from the given url and if possible redirect to the link of that domain link which is extracted from the given url and give response in json format url=link like this.
答案1
得分: 0
你正在请求一段用于抓取网站内容的代码。
未经源所有者同意获取特定内容是非法的。
换句话说,带有:183
端口的链接,如果不在<a>
标签下,而是在<video>
--><source>
标签下。
请更正以下代码行:
$anchors = $dom->getElementsByTagName('a');
改为
$anchors = $dom->getElementsByTagName('source');
同时将以下代码行:
$href = $anchor->getAttribute('href');
改为
$href = $anchor->getAttribute('src');
注意:
网络抓取需要从源网站提取数据的所有者许可。
英文:
You are asking a code to scrape a website.
This is illegal to get certain contents without the source owner's concern.
By saying this, the links with :183
port, if not under <a>
tag. Its under <video>
--><source>
tag.
Please correct your line
$anchors = $dom->getElementsByTagName('a');
accordingly to $anchors = $dom->getElementsByTagName('source');
.
Also change the line $href = $anchor->getAttribute('href');
to $href = $anchor->getAttribute('src');
.
Beware :
Web Scrapping need owner's permission to extract data from source website.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论