2017年1月24日 14:05:37go评论120阅读模式

英文:

How to unmarshal xml data from online xml file

问题

我有一个假设的 XML 文件，位于 https://www.notre-shop.com/sitemap_products_1.xml，我想在我的 Go 代码中对这个 XML 进行解析，所以我做了以下操作：

package main
import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)
var Product struct {
    Locs []string `xml:"url>loc"`
    Name []string `xml:"url>image>title"`
}
func main() {
    res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
    if err != nil {
        log.Fatal(err)
    }
    data, err := ioutil.ReadAll(res.Body)
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    err = xml.Unmarshal(data, &Product)
    if err != nil {
        log.Fatal(err)
    }
    for x := range Product.Name {
        fmt.Println(Product.Name[x], Product.Locs[x])
    }
}

但是这段代码没有输出任何内容。我做错了什么？

这是完整的代码：https://play.golang.org/p/pZ6j4-lSEz。

英文:

I have an xml file on suppose https://www.notre-shop.com/sitemap_products_1.xml and I want to unmarshal this xml in my go code so I did this

package main
import (
    &quot;encoding/xml&quot;
    &quot;fmt&quot;
    &quot;io/ioutil&quot;
    &quot;log&quot;
    &quot;net/http&quot;
)
var Product struct {
    Locs []string `xml:&quot;url&gt;loc&quot;`
    Name []string `xml:&quot;url&gt;image:title&quot;`
}
func main() {
    res, err := http.Get(&quot;https://www.notre-shop.com/sitemap_products_1.xml&quot;)
    if err!=nil{
        log.Fatal(err)
    }
    data, err := ioutil.ReadAll(res.Body)
    if err!=nil{
        log.Fatal(err)
    }
    defer res.Body.Close()
    err = xml.Unmarshal(data, &amp;Product)
    if err!=nil{
        log.Fatal(err)
    }
    for x, _ := range Product.Name {
        fmt.Println(Product.Name[x], Product.Locs[x])
    }
}

But this doesn't print anything. What am I doing wrong?

Here is the complete code https://play.golang.org/p/pZ6j4-lSEz on play.

答案1

得分: 3

请尝试以下代码，这段代码对我有效（注意：你也可以使用ioutil.ReadAll和xml.Unmarshal，而不是xml.Decode）：

package main
import (
	"encoding/xml"
	"fmt"
	"log"
	"net/http"
)
type URLSet struct {
	XMLName string `xml:"urlset"`
	URLs []URL `xml:"url"`
}
type URL struct {
	Loc   string `xml:"loc"`
	Image Image  `xml:"image"`
}
type Image struct {
	Title string `xml:"title"`
}
func main() {
	resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
	if err != nil {
		log.Fatalln(err)
	}
	defer resp.Body.Close()
	var urlSet URLSet
	if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
		log.Fatalln(err)
	}
	for _, url := range urlSet.URLs {
		fmt.Println(url.Loc, url.Image.Title)
	}
}

这段代码可以从指定的URL获取XML数据，并解析为结构体。然后，它会遍历解析后的结构体，并打印每个URL和对应的标题。

英文:

Please try the following code which works for me (note: you could also use ioutil.ReadAll and xml.Unmarshal as you had before, instead of xml.Decode):

package main
import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;net/http&quot;
)
// &lt;urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot; xmlns:image=&quot;http://www.google.com/schemas/sitemap-image/1.1&quot;&gt;
//    &lt;url&gt;
//        &lt;loc&gt;
//            https://www.notre-shop.com/products/test-product-releasing-soon-2
//        &lt;/loc&gt;
//        &lt;lastmod&gt;2017-01-17T08:04:44Z&lt;/lastmod&gt;
//        &lt;changefreq&gt;daily&lt;/changefreq&gt;
//        &lt;image:image&gt;
//            &lt;image:loc&gt;
//                https://cdn.shopify.com/s/files/1/0624/0605/products/NOTRE-CHICAGO-QK9C9548_fde37b05-495e-47b0-8dd1-b053c9ed3545.jpg?v=1481853712
//            &lt;/image:loc&gt;
//            &lt;image:title&gt;Test Product Releasing Soon 2&lt;/image:title&gt;
//        &lt;/image:image&gt;
//    &lt;/url&gt;
// &lt;/urlset&gt;
type URLSet struct {
	XMLName string `xml:&quot;urlset&quot;`
	URLs []URL `xml:&quot;url&quot;`
}
type URL struct {
	Loc   string `xml:&quot;loc&quot;`
	Image Image  `xml:&quot;image&quot;`
}
type Image struct {
	Title string `xml:&quot;title&quot;`
}
func main() {
	resp, err := http.Get(&quot;https://www.notre-shop.com/sitemap_products_1.xml&quot;)
	if err != nil {
		log.Fatalln(err) // log.Fatal always exits the program, need to check err != nil first
	}
	defer resp.Body.Close()
	var urlSet URLSet
	if err = xml.NewDecoder(resp.Body).Decode(&amp;urlSet); err != nil {
		log.Fatalln(err)
	}
	for _, url := range urlSet.URLs {
		fmt.Println(url.Loc, url.Image.Title)
	}
}

答案2

得分: 0

这是XML规范中的内容：

XML规范中的"Namespaces in XML Recommendation"（XML命名空间推荐）为包含冒号字符的名称赋予了特殊含义。因此，作者在XML名称中除了用于命名空间目的之外，不应该使用冒号。但是XML处理器必须接受冒号作为名称字符。

这是XML命名空间推荐的内容：

XML命名空间推荐以与XML 1.0兼容的间接方式表示通用名称。实际上，XML命名空间推荐定义了一种从XML 1.0树到可以是通用名称的树的映射。该映射基于前缀的概念。如果元素类型名称或属性名称包含冒号，则映射将名称中冒号之前的部分视为前缀，将名称中冒号之后的部分视为本地名称。前缀"foo"引用xmlns:foo属性值中指定的URI。

你不能使用带有冒号（:）的名称来获取内部元素，而是可以避免使用前缀，这是重写后的代码：

对于性能/内存方面的考虑，由于你得到的是io.Reader，你可以使用xml.Decoder而不是xml.Unmarshal。

package main
import (
	"encoding/xml"
	"fmt"
	"log"
	"net/http"
)
var Product struct {
	Locs []string `xml:"url>loc"`
	Name []Image  `xml:"url>image"`
}
type Image struct {
	Title string `xml:"title"`
}
func main() {
	res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
	if err != nil {
		log.Fatal(err)
	}
	defer res.Body.Close()
	decoder := xml.NewDecoder(res.Body)
	err = decoder.Decode(&Product)
	if err != nil {
		log.Fatal(err)
	}
	for x, _ := range Product.Name {
		fmt.Println(Product.Name[x].Title, Product.Locs[x])
	}
}

这是play链接：play

英文:

This is what the XML Specification says

> The Namespaces in XML Recommendation [XML Names] assigns a meaning
> to names containing colon characters. Therefore, authors should not
> use the colon in XML names except for namespace purposes, but XML
> processors must accept the colon as a name character.

This is the XML Namespace Recommendation says

> The XML Namespaces Recommendation expresses universal names in an
> indirect way that is compatible with XML 1.0. In effect the XML
> Namespaces Recommendation defines a mapping from an XML 1.0 tree where
> element type names and attribute names are local names into a tree
> where element type names and attribute names can be universal names.
> The mapping is based on the idea of a prefix. If an element type name
> or attribute name contains a colon, then the mapping treats the part
> of the name before the colon as a prefix, and the part of the name
> after the colon as the local name. A prefix foo refers to the URI
> specified in the value of the xmlns:foo attribute.

You may not use name with colon(:) to get the inner elements rather you can avoid the pefix ,here is your code re-written

And for performance/memory considerations since you are getting a io.Reader you may use xml.Decoder instead of xml.Unmarshal.

package main
import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;net/http&quot;
)
var Product struct {
	Locs []string `xml:&quot;url&gt;loc&quot;`
	Name []Image  `xml:&quot;url&gt;image&quot;`
}
type Image struct {
	Title string `xml:&quot;title&quot;`
}
func main() {
	res, err := http.Get(&quot;https://www.notre-shop.com/sitemap_products_1.xml&quot;)
	if err != nil {
		log.Fatal(err)
	}
	defer res.Body.Close()
	decoder := xml.NewDecoder(res.Body)
	err = decoder.Decode(&amp;Product)
	if err != nil {
		log.Fatal(err)
	}
	for x, _ := range Product.Name {
		fmt.Println(Product.Name[x].Title, Product.Locs[x])
	}
}

Here is play link play

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从在线 XML 文件中解析 XML 数据

问题

答案1

答案2

Go的CPU分析器缺乏函数调用信息。

Golang：将浮点数转换为十六进制字符串

如何从gorilla mux.Router中过滤掉某些路径？

Go: split byte.Buffer by newline

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。