How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list using golang?

huangapple go评论82阅读模式
英文:

How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list using golang?

问题

目前我正在使用以下函数下载文件,我也想从URL下载文件夹。

任何帮助将不胜感激。

package main

import (
    "fmt"
    "io"
    "net/http"
    "os"
)

func main() {
    fileUrl := "http://example.com/file.txt"
    err := DownloadFile("./example.txt", fileUrl)
    if err != nil {
        panic(err)
    }
    fmt.Println("Downloaded: " + fileUrl)
}

// DownloadFile将下载URL的内容保存到本地文件。
func DownloadFile(filepath string, url string) error {

    // 获取数据
    resp, err := http.Get(url)
    contentType := resp.Header.Get("Content-Type")

    if err != nil {
        return err
    }
    defer resp.Body.Close()

    if contentType == "application/octet-stream" {
        // 创建文件
        out, err := os.Create(filepath)
        if err != nil {
            return err
        }
        defer out.Close()

        // 将内容写入文件
        _, err = io.Copy(out, resp.Body)
        return err
    } else {
        fmt.Println("无法下载请求的URL")
    }
    return nil
}

我参考了以下链接:
https://stackoverflow.com/questions/23446635/how-to-download-http-directory-with-all-files-and-sub-directories-as-they-appear

但我想要的是用Go语言实现。

英文:

Currently I am downloading files using below function and I wanted to download folders as well from the URL

Any help would be appreciated

 package main
        
        import (
            "fmt"
            "io"
            "net/http"
            "os"
        )
        
        func main() {
            fileUrl := "http://example.com/file.txt"
            err := DownloadFile("./example.txt", fileUrl)
            if err != nil {
                panic(err)
            }
            fmt.Println("Downloaded: " + fileUrl)
        }
        
        // DownloadFile will download a url to a local file.
        func DownloadFile(filepath string, url string) error {
        
            // Get the data
            resp, err := http.Get(url)
            contentType = resp.Header.Get("Content-Type")  
    
            if err != nil {
                return err
            }
            defer resp.Body.Close()
    
    if contentType == "application/octet-stream" {
            // Create the file
            out, err := os.Create(filepath)
            if err != nil {
                return err
            }
            defer out.Close()
        
            // Write the body to file
            _, err = io.Copy(out, resp.Body)
            return err
        }
        }else{
        fmt.Println("Requested URL is not downloadable")
        }

I have referred below link :
https://stackoverflow.com/questions/23446635/how-to-download-http-directory-with-all-files-and-sub-directories-as-they-appear

but I wanted it in golang

答案1

得分: 0

在这里,您可以找到wget --recursive实现的算法:https://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html

基本上,您访问页面,然后解析HTML并跟踪每个href链接(如果需要,还包括css链接),可以像这样提取链接:https://vorozhko.net/get-all-links-from-html-page-with-go-lang。

一旦您获得了所有的链接,只需对它们进行请求,并根据Content-Type头部进行保存(如果不是text/html)或者解析链接(如果是text/html)。

英文:

Here you can find the algorithm for the wget --recursive implementation: https://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html

Basically, you access the page and then parse the HTML and follow each href link (and css link if necessary), which can be extracted like this: https://vorozhko.net/get-all-links-from-html-page-with-go-lang.

Once you have all the links just do a request on them and based on the Content-Type header you save it if it is not text/html or parse it for links if it is.

huangapple
  • 本文由 发表于 2021年11月3日 14:38:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/69820582.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定