英文:
Golang: How to download a page from Internet with absolute links in html
问题
从这个代码片段中:
<head>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<img src="img.jpg" alt="" width="500" height="600">
我想要得到这个结果:
<head>
<link rel="stylesheet" href="http://bbc.com/styles.css">
</head>
<body>
<img src="http://bbc.com/img.jpg" alt="" width="500" height="600">
当我下载一个网页时,其中包含相对链接到css、图片等。如何在下载时将HTML页面转换为所有链接都是绝对链接而不是相对链接?我使用这个答案来下载一个页面(https://stackoverflow.com/questions/40643030/how-to-get-webpage-content-into-a-string-using-go):
func main() {
s := OnPage("http://bbc.com/")
fmt.Printf(s)
}
func OnPage(link string) string {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
英文:
From this:
<head>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<img src="img.jpg" alt="" width="500" height="600">
I want to get this:
<head>
<link rel="stylesheet" href="http://bbc.com/styles.css">
</head>
<body>
<img src="http://bbc.com/img.jpg" alt="" width="500" height="600">
When I download a page there are relative links to css, images, etc. How to convert an HTML page while downloading to have all links in it as absolute not relative? I use this answer to download a page (https://stackoverflow.com/questions/40643030/how-to-get-webpage-content-into-a-string-using-go):
func main() {
s := OnPage("http://bbc.com/")
fmt.Printf(s)
}
func OnPage(link string) string {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
答案1
得分: 1
你需要使用正则表达式来替换HTML字符串中需要的部分。以下是如何做到这一点的代码(假设页面上的所有链接都是相对链接,如果不是,你需要调整代码):
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"regexp"
)
func main() {
s := OnPage("http://bbc.com/")
fmt.Printf(s)
}
func OnPage(link string) string {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
html := string(content)
var re = regexp.MustCompile(`(<img[^>]+src)="([^"]+)"`)
updatedHTML := re.ReplaceAllString(html, `$1="`+link+`$2"`)
re = regexp.MustCompile(`(<link[^>]+href)="([^"]+)"`)
updatedHTML = re.ReplaceAllString(html, `$1="`+link+`$2"`)
return updatedHTML
}
请注意,这是一个使用Go语言编写的示例代码,用于从指定链接的网页中提取并替换图片和链接的URL。
英文:
You have to use Regular Expressions to replace the needed portions of the html string. Here is how you can do it (I suppose all links on the page are relative, if not, you should adjust the code):
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"regexp"
)
func main() {
s := OnPage("http://bbc.com/")
fmt.Printf(s)
}
func OnPage(link string) string {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
html := string(content)
var re = regexp.MustCompile(`(<img[^>]+src)="([^"]+)"`)
updatedHTML := re.ReplaceAllString(html, `$1="`+link+`$2"`)
re = regexp.MustCompile(`(<link[^>]+href)="([^"]+)"`)
updatedHTML = re.ReplaceAllString(html, `$1="`+link+`$2"`)
return updatedHTML
}
答案2
得分: 1
我为下载任何URL的内容(包括图像、CSS、JS和视频)构建了一个包。
请查看:https://github.com/Riaz-Mahmud/Websitebackup
安装
composer require backdoor/websitebackup
用法
use Backdoor\WebsiteBackup\WebsiteBackup;
function siteBackup(){
$url = '要备份的网站页面链接';
$path = '保存备份文件的路径';
$websiteBackup = new WebsiteBackup();
$backup = $websiteBackup->backup($url, $path);
}
英文:
I built a package for downloading content from any URL, including images, CSS, JS, and video.
Check it out: https://github.com/Riaz-Mahmud/Websitebackup
Installation
composer require backdoor/websitebackup
Usage
use Backdoor\WebsiteBackup\WebsiteBackup;
function siteBackup(){
$url = 'link to your website page to backup';
$path = 'path to save backup file';
$websiteBackup = new WebsiteBackup();
$backup = $websiteBackup->backup($url, $path);
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论