英文:
How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list using golang?
问题
目前我正在使用以下函数下载文件,我也想从URL下载文件夹。
任何帮助将不胜感激。
package main
import (
"fmt"
"io"
"net/http"
"os"
)
func main() {
fileUrl := "http://example.com/file.txt"
err := DownloadFile("./example.txt", fileUrl)
if err != nil {
panic(err)
}
fmt.Println("Downloaded: " + fileUrl)
}
// DownloadFile将下载URL的内容保存到本地文件。
func DownloadFile(filepath string, url string) error {
// 获取数据
resp, err := http.Get(url)
contentType := resp.Header.Get("Content-Type")
if err != nil {
return err
}
defer resp.Body.Close()
if contentType == "application/octet-stream" {
// 创建文件
out, err := os.Create(filepath)
if err != nil {
return err
}
defer out.Close()
// 将内容写入文件
_, err = io.Copy(out, resp.Body)
return err
} else {
fmt.Println("无法下载请求的URL")
}
return nil
}
但我想要的是用Go语言实现。
英文:
Currently I am downloading files using below function and I wanted to download folders as well from the URL
Any help would be appreciated
package main
import (
"fmt"
"io"
"net/http"
"os"
)
func main() {
fileUrl := "http://example.com/file.txt"
err := DownloadFile("./example.txt", fileUrl)
if err != nil {
panic(err)
}
fmt.Println("Downloaded: " + fileUrl)
}
// DownloadFile will download a url to a local file.
func DownloadFile(filepath string, url string) error {
// Get the data
resp, err := http.Get(url)
contentType = resp.Header.Get("Content-Type")
if err != nil {
return err
}
defer resp.Body.Close()
if contentType == "application/octet-stream" {
// Create the file
out, err := os.Create(filepath)
if err != nil {
return err
}
defer out.Close()
// Write the body to file
_, err = io.Copy(out, resp.Body)
return err
}
}else{
fmt.Println("Requested URL is not downloadable")
}
I have referred below link :
https://stackoverflow.com/questions/23446635/how-to-download-http-directory-with-all-files-and-sub-directories-as-they-appear
but I wanted it in golang
答案1
得分: 0
在这里,您可以找到wget --recursive
实现的算法:https://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html
基本上,您访问页面,然后解析HTML并跟踪每个href链接(如果需要,还包括css链接),可以像这样提取链接:https://vorozhko.net/get-all-links-from-html-page-with-go-lang。
一旦您获得了所有的链接,只需对它们进行请求,并根据Content-Type头部进行保存(如果不是text/html
)或者解析链接(如果是text/html
)。
英文:
Here you can find the algorithm for the wget --recursive
implementation: https://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html
Basically, you access the page and then parse the HTML and follow each href link (and css link if necessary), which can be extracted like this: https://vorozhko.net/get-all-links-from-html-page-with-go-lang.
Once you have all the links just do a request on them and based on the Content-Type header you save it if it is not text/html
or parse it for links if it is.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论