如何从在线 XML 文件中解析 XML 数据

huangapple go评论97阅读模式
英文:

How to unmarshal xml data from online xml file

问题

我有一个假设的 XML 文件,位于 https://www.notre-shop.com/sitemap_products_1.xml,我想在我的 Go 代码中对这个 XML 进行解析,所以我做了以下操作:

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

var Product struct {
    Locs []string `xml:"url>loc"`
    Name []string `xml:"url>image>title"`
}

func main() {
    res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
    if err != nil {
        log.Fatal(err)
    }

    data, err := ioutil.ReadAll(res.Body)
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()

    err = xml.Unmarshal(data, &Product)
    if err != nil {
        log.Fatal(err)
    }
    for x := range Product.Name {
        fmt.Println(Product.Name[x], Product.Locs[x])
    }
}

但是这段代码没有输出任何内容。我做错了什么?

这是完整的代码:https://play.golang.org/p/pZ6j4-lSEz。

英文:

I have an xml file on suppose https://www.notre-shop.com/sitemap_products_1.xml and I want to unmarshal this xml in my go code so I did this

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

var Product struct {
    Locs []string `xml:"url>loc"`
    Name []string `xml:"url>image:title"`
}

func main() {
    res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
    if err!=nil{
        log.Fatal(err)
    }

    data, err := ioutil.ReadAll(res.Body)
    if err!=nil{
        log.Fatal(err)
    }
    defer res.Body.Close()

    err = xml.Unmarshal(data, &Product)
    if err!=nil{
        log.Fatal(err)
    }
    for x, _ := range Product.Name {
        fmt.Println(Product.Name[x], Product.Locs[x])
    }
}

But this doesn't print anything. What am I doing wrong?

Here is the complete code https://play.golang.org/p/pZ6j4-lSEz on play.

答案1

得分: 3

请尝试以下代码,这段代码对我有效(注意:你也可以使用ioutil.ReadAllxml.Unmarshal,而不是xml.Decode):

package main

import (
	"encoding/xml"
	"fmt"
	"log"
	"net/http"
)

type URLSet struct {
	XMLName string `xml:"urlset"`

	URLs []URL `xml:"url"`
}

type URL struct {
	Loc   string `xml:"loc"`
	Image Image  `xml:"image"`
}

type Image struct {
	Title string `xml:"title"`
}

func main() {
	resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
	if err != nil {
		log.Fatalln(err)
	}
	defer resp.Body.Close()

	var urlSet URLSet
	if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
		log.Fatalln(err)
	}

	for _, url := range urlSet.URLs {
		fmt.Println(url.Loc, url.Image.Title)
	}
}

这段代码可以从指定的URL获取XML数据,并解析为结构体。然后,它会遍历解析后的结构体,并打印每个URL和对应的标题。

英文:

Please try the following code which works for me (note: you could also use ioutil.ReadAll and xml.Unmarshal as you had before, instead of xml.Decode):

package main

import (
	"encoding/xml"
	"fmt"
	"log"
	"net/http"
)

// <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
//    <url>
//        <loc>
//            https://www.notre-shop.com/products/test-product-releasing-soon-2
//        </loc>
//        <lastmod>2017-01-17T08:04:44Z</lastmod>
//        <changefreq>daily</changefreq>
//        <image:image>
//            <image:loc>
//                https://cdn.shopify.com/s/files/1/0624/0605/products/NOTRE-CHICAGO-QK9C9548_fde37b05-495e-47b0-8dd1-b053c9ed3545.jpg?v=1481853712
//            </image:loc>
//            <image:title>Test Product Releasing Soon 2</image:title>
//        </image:image>
//    </url>
// </urlset>
type URLSet struct {
	XMLName string `xml:"urlset"`

	URLs []URL `xml:"url"`
}

type URL struct {
	Loc   string `xml:"loc"`
	Image Image  `xml:"image"`
}

type Image struct {
	Title string `xml:"title"`
}

func main() {
	resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
	if err != nil {
		log.Fatalln(err) // log.Fatal always exits the program, need to check err != nil first
	}
	defer resp.Body.Close()

	var urlSet URLSet
	if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
		log.Fatalln(err)
	}

	for _, url := range urlSet.URLs {
		fmt.Println(url.Loc, url.Image.Title)
	}
}

答案2

得分: 0

这是XML规范中的内容:

XML规范中的"Namespaces in XML Recommendation"(XML命名空间推荐)为包含冒号字符的名称赋予了特殊含义。因此,作者在XML名称中除了用于命名空间目的之外,不应该使用冒号。但是XML处理器必须接受冒号作为名称字符。

这是XML命名空间推荐的内容:

XML命名空间推荐以与XML 1.0兼容的间接方式表示通用名称。实际上,XML命名空间推荐定义了一种从XML 1.0树到可以是通用名称的树的映射。该映射基于前缀的概念。如果元素类型名称或属性名称包含冒号,则映射将名称中冒号之前的部分视为前缀,将名称中冒号之后的部分视为本地名称。前缀"foo"引用xmlns:foo属性值中指定的URI。

你不能使用带有冒号(:)的名称来获取内部元素,而是可以避免使用前缀,这是重写后的代码:

对于性能/内存方面的考虑,由于你得到的是io.Reader,你可以使用xml.Decoder而不是xml.Unmarshal。

package main

import (
	"encoding/xml"
	"fmt"
	"log"
	"net/http"
)

var Product struct {
	Locs []string `xml:"url>loc"`
	Name []Image  `xml:"url>image"`
}

type Image struct {
	Title string `xml:"title"`
}

func main() {
	res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
	if err != nil {
		log.Fatal(err)
	}
	defer res.Body.Close()
	decoder := xml.NewDecoder(res.Body)
	err = decoder.Decode(&Product)
	if err != nil {
		log.Fatal(err)
	}
	for x, _ := range Product.Name {
		fmt.Println(Product.Name[x].Title, Product.Locs[x])
	}
}

这是play链接:play

英文:

This is what the XML Specification says

> The Namespaces in XML Recommendation [XML Names] assigns a meaning
> to names containing colon characters. Therefore, authors should not
> use the colon in XML names except for namespace purposes, but XML
> processors must accept the colon as a name character.

This is the XML Namespace Recommendation says

> The XML Namespaces Recommendation expresses universal names in an
> indirect way that is compatible with XML 1.0. In effect the XML
> Namespaces Recommendation defines a mapping from an XML 1.0 tree where
> element type names and attribute names are local names into a tree
> where element type names and attribute names can be universal names.
> The mapping is based on the idea of a prefix. If an element type name
> or attribute name contains a colon, then the mapping treats the part
> of the name before the colon as a prefix, and the part of the name
> after the colon as the local name. A prefix foo refers to the URI
> specified in the value of the xmlns:foo attribute.

You may not use name with colon(:) to get the inner elements rather you can avoid the pefix ,here is your code re-written

And for performance/memory considerations since you are getting a io.Reader you may use xml.Decoder instead of xml.Unmarshal.

package main

import (
	"encoding/xml"
	"fmt"
	"log"
	"net/http"
)

var Product struct {
	Locs []string `xml:"url>loc"`
	Name []Image  `xml:"url>image"`
}

type Image struct {
	Title string `xml:"title"`
}

func main() {
	res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
	if err != nil {
		log.Fatal(err)
	}
	defer res.Body.Close()
	decoder := xml.NewDecoder(res.Body)
	err = decoder.Decode(&Product)
	if err != nil {
		log.Fatal(err)
	}
	for x, _ := range Product.Name {
		fmt.Println(Product.Name[x].Title, Product.Locs[x])
	}
}

Here is play link play

huangapple
  • 本文由 发表于 2017年1月24日 14:05:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/41821088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定