如何从在线 XML 文件中解析 XML 数据

huangapple go评论120阅读模式
英文:

How to unmarshal xml data from online xml file

问题

我有一个假设的 XML 文件,位于 https://www.notre-shop.com/sitemap_products_1.xml,我想在我的 Go 代码中对这个 XML 进行解析,所以我做了以下操作:

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. "io/ioutil"
  6. "log"
  7. "net/http"
  8. )
  9. var Product struct {
  10. Locs []string `xml:"url>loc"`
  11. Name []string `xml:"url>image>title"`
  12. }
  13. func main() {
  14. res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
  15. if err != nil {
  16. log.Fatal(err)
  17. }
  18. data, err := ioutil.ReadAll(res.Body)
  19. if err != nil {
  20. log.Fatal(err)
  21. }
  22. defer res.Body.Close()
  23. err = xml.Unmarshal(data, &Product)
  24. if err != nil {
  25. log.Fatal(err)
  26. }
  27. for x := range Product.Name {
  28. fmt.Println(Product.Name[x], Product.Locs[x])
  29. }
  30. }

但是这段代码没有输出任何内容。我做错了什么?

这是完整的代码:https://play.golang.org/p/pZ6j4-lSEz。

英文:

I have an xml file on suppose https://www.notre-shop.com/sitemap_products_1.xml and I want to unmarshal this xml in my go code so I did this

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. "io/ioutil"
  6. "log"
  7. "net/http"
  8. )
  9. var Product struct {
  10. Locs []string `xml:"url>loc"`
  11. Name []string `xml:"url>image:title"`
  12. }
  13. func main() {
  14. res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
  15. if err!=nil{
  16. log.Fatal(err)
  17. }
  18. data, err := ioutil.ReadAll(res.Body)
  19. if err!=nil{
  20. log.Fatal(err)
  21. }
  22. defer res.Body.Close()
  23. err = xml.Unmarshal(data, &Product)
  24. if err!=nil{
  25. log.Fatal(err)
  26. }
  27. for x, _ := range Product.Name {
  28. fmt.Println(Product.Name[x], Product.Locs[x])
  29. }
  30. }

But this doesn't print anything. What am I doing wrong?

Here is the complete code https://play.golang.org/p/pZ6j4-lSEz on play.

答案1

得分: 3

请尝试以下代码,这段代码对我有效(注意:你也可以使用ioutil.ReadAllxml.Unmarshal,而不是xml.Decode):

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. "log"
  6. "net/http"
  7. )
  8. type URLSet struct {
  9. XMLName string `xml:"urlset"`
  10. URLs []URL `xml:"url"`
  11. }
  12. type URL struct {
  13. Loc string `xml:"loc"`
  14. Image Image `xml:"image"`
  15. }
  16. type Image struct {
  17. Title string `xml:"title"`
  18. }
  19. func main() {
  20. resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
  21. if err != nil {
  22. log.Fatalln(err)
  23. }
  24. defer resp.Body.Close()
  25. var urlSet URLSet
  26. if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
  27. log.Fatalln(err)
  28. }
  29. for _, url := range urlSet.URLs {
  30. fmt.Println(url.Loc, url.Image.Title)
  31. }
  32. }

这段代码可以从指定的URL获取XML数据,并解析为结构体。然后,它会遍历解析后的结构体,并打印每个URL和对应的标题。

英文:

Please try the following code which works for me (note: you could also use ioutil.ReadAll and xml.Unmarshal as you had before, instead of xml.Decode):

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. "log"
  6. "net/http"
  7. )
  8. // <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  9. // <url>
  10. // <loc>
  11. // https://www.notre-shop.com/products/test-product-releasing-soon-2
  12. // </loc>
  13. // <lastmod>2017-01-17T08:04:44Z</lastmod>
  14. // <changefreq>daily</changefreq>
  15. // <image:image>
  16. // <image:loc>
  17. // https://cdn.shopify.com/s/files/1/0624/0605/products/NOTRE-CHICAGO-QK9C9548_fde37b05-495e-47b0-8dd1-b053c9ed3545.jpg?v=1481853712
  18. // </image:loc>
  19. // <image:title>Test Product Releasing Soon 2</image:title>
  20. // </image:image>
  21. // </url>
  22. // </urlset>
  23. type URLSet struct {
  24. XMLName string `xml:"urlset"`
  25. URLs []URL `xml:"url"`
  26. }
  27. type URL struct {
  28. Loc string `xml:"loc"`
  29. Image Image `xml:"image"`
  30. }
  31. type Image struct {
  32. Title string `xml:"title"`
  33. }
  34. func main() {
  35. resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
  36. if err != nil {
  37. log.Fatalln(err) // log.Fatal always exits the program, need to check err != nil first
  38. }
  39. defer resp.Body.Close()
  40. var urlSet URLSet
  41. if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
  42. log.Fatalln(err)
  43. }
  44. for _, url := range urlSet.URLs {
  45. fmt.Println(url.Loc, url.Image.Title)
  46. }
  47. }

答案2

得分: 0

这是XML规范中的内容:

XML规范中的"Namespaces in XML Recommendation"(XML命名空间推荐)为包含冒号字符的名称赋予了特殊含义。因此,作者在XML名称中除了用于命名空间目的之外,不应该使用冒号。但是XML处理器必须接受冒号作为名称字符。

这是XML命名空间推荐的内容:

XML命名空间推荐以与XML 1.0兼容的间接方式表示通用名称。实际上,XML命名空间推荐定义了一种从XML 1.0树到可以是通用名称的树的映射。该映射基于前缀的概念。如果元素类型名称或属性名称包含冒号,则映射将名称中冒号之前的部分视为前缀,将名称中冒号之后的部分视为本地名称。前缀"foo"引用xmlns:foo属性值中指定的URI。

你不能使用带有冒号(:)的名称来获取内部元素,而是可以避免使用前缀,这是重写后的代码:

对于性能/内存方面的考虑,由于你得到的是io.Reader,你可以使用xml.Decoder而不是xml.Unmarshal。

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. "log"
  6. "net/http"
  7. )
  8. var Product struct {
  9. Locs []string `xml:"url>loc"`
  10. Name []Image `xml:"url>image"`
  11. }
  12. type Image struct {
  13. Title string `xml:"title"`
  14. }
  15. func main() {
  16. res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
  17. if err != nil {
  18. log.Fatal(err)
  19. }
  20. defer res.Body.Close()
  21. decoder := xml.NewDecoder(res.Body)
  22. err = decoder.Decode(&Product)
  23. if err != nil {
  24. log.Fatal(err)
  25. }
  26. for x, _ := range Product.Name {
  27. fmt.Println(Product.Name[x].Title, Product.Locs[x])
  28. }
  29. }

这是play链接:play

英文:

This is what the XML Specification says

> The Namespaces in XML Recommendation [XML Names] assigns a meaning
> to names containing colon characters. Therefore, authors should not
> use the colon in XML names except for namespace purposes, but XML
> processors must accept the colon as a name character.

This is the XML Namespace Recommendation says

> The XML Namespaces Recommendation expresses universal names in an
> indirect way that is compatible with XML 1.0. In effect the XML
> Namespaces Recommendation defines a mapping from an XML 1.0 tree where
> element type names and attribute names are local names into a tree
> where element type names and attribute names can be universal names.
> The mapping is based on the idea of a prefix. If an element type name
> or attribute name contains a colon, then the mapping treats the part
> of the name before the colon as a prefix, and the part of the name
> after the colon as the local name. A prefix foo refers to the URI
> specified in the value of the xmlns:foo attribute.

You may not use name with colon(:) to get the inner elements rather you can avoid the pefix ,here is your code re-written

And for performance/memory considerations since you are getting a io.Reader you may use xml.Decoder instead of xml.Unmarshal.

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. "log"
  6. "net/http"
  7. )
  8. var Product struct {
  9. Locs []string `xml:"url>loc"`
  10. Name []Image `xml:"url>image"`
  11. }
  12. type Image struct {
  13. Title string `xml:"title"`
  14. }
  15. func main() {
  16. res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
  17. if err != nil {
  18. log.Fatal(err)
  19. }
  20. defer res.Body.Close()
  21. decoder := xml.NewDecoder(res.Body)
  22. err = decoder.Decode(&Product)
  23. if err != nil {
  24. log.Fatal(err)
  25. }
  26. for x, _ := range Product.Name {
  27. fmt.Println(Product.Name[x].Title, Product.Locs[x])
  28. }
  29. }

Here is play link play

huangapple
  • 本文由 发表于 2017年1月24日 14:05:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/41821088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定