英文:
How to obtain all URLs from an external page?
问题
以下是翻译好的部分:
我的目标是从[此页面][1]获取所有URL。
这是我尝试过的:
```php
<?php
$url = 'https://www.coop.ch/de/aktionen/wochenaktionen/aktionen-fleisch-fisch/c/m_1380?q=' . urlencode(':relevance') . '&sort=specialOffers&pageSize=10000&page=1';
$html = file_get_contents($url);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$links = $dom->getElementsByTagName('a');
foreach ($links as $index => $link) {
$href = $link->getAttribute('href');
echo "$href <br>";
}
这给了我一些URL,但并非全部。很可能是因为某些内容是动态的。
我也尝试了JavaScript的Fetch API,但这不会起作用,因为受到了CORS策略的限制。
如何获取这些URL?
更新
来自KIKO Software的评论解决了这个问题。但我也想获取这个网站的URL。这使得情况变得有些困难,因为在源代码中我甚至看不到URL。可能是因为这个网站基于Angular。有没有办法处理这个?
[1]: https://www.coop.ch/de/aktionen/wochenaktionen/aktionen-fleisch-fisch/c/m_1380?q=%3Arelevance&sort=specialOffers&pageSize=10000&page=1
[2]: https://stackoverflow.com/users/3986005/kiko-software
[3]: https://www.migros.ch/de/offers/home?context=instore&gad=1&gclid=Cj0KCQjw2eilBhCCARIsAG0Pf8tJLOLm9ncwTQe0b-h2aWV2GIr7iQeEg8cAO8GK6eCl5ggnyLxBnPUaAlKWEALw_wcB
英文:
My goal is to obtain all URLs from this page.
This is what I have tried:
<?php
$url = 'https://www.coop.ch/de/aktionen/wochenaktionen/aktionen-fleisch-fisch/c/m_1380?q=' . urlencode(':relevance') . '&sort=specialOffers&pageSize=10000&page=1';
$html = file_get_contents($url);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$links = $dom->getElementsByTagName('a');
foreach ($links as $index => $link) {
$href = $link->getAttribute('href');
echo "$href <br>";
}
That gives me some of the URLs but not all of them. Most likely it is because some of the contents are dynamic.
I also tried it with the Fetch API of JavaScript, but this will not work because of CORS Policy.
How do I get the URLs?
Update
The comment from KIKO Software resolved this issue. But I would also like to obtain the URLs from this website. This makes it a bit more difficult since I don't even see the URLs in the source code. Probably because this website is based on Angular. Is there a way to handle this?
答案1
得分: 1
以下是您要翻译的部分:
"I think the URL's you're interested in are located in a script element that is used as data block and contains JSON-LD.
Now I agree that you would normally use DOMDocument to parse HTML, but perhaps here a simply string extraction would do the job.
Here is my attempt:
$url = 'https://www.coop.ch/de/aktionen/wochenaktionen/aktionen-fleisch-fisch/c/m_1380?q=' . urlencode(':relevance') . '&sort=specialOffers&pageSize=10000&page=1';
$html = file_get_contents($url);
$json = str_before('',
str_after('
评论