英文:
How to unmarshal xml data from online xml file
问题
我有一个假设的 XML 文件,位于 https://www.notre-shop.com/sitemap_products_1.xml,我想在我的 Go 代码中对这个 XML 进行解析,所以我做了以下操作:
package main
import (
"encoding/xml"
"fmt"
"io/ioutil"
"log"
"net/http"
)
var Product struct {
Locs []string `xml:"url>loc"`
Name []string `xml:"url>image>title"`
}
func main() {
res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
if err != nil {
log.Fatal(err)
}
data, err := ioutil.ReadAll(res.Body)
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
err = xml.Unmarshal(data, &Product)
if err != nil {
log.Fatal(err)
}
for x := range Product.Name {
fmt.Println(Product.Name[x], Product.Locs[x])
}
}
但是这段代码没有输出任何内容。我做错了什么?
这是完整的代码:https://play.golang.org/p/pZ6j4-lSEz。
英文:
I have an xml file on suppose https://www.notre-shop.com/sitemap_products_1.xml and I want to unmarshal this xml in my go code so I did this
package main
import (
"encoding/xml"
"fmt"
"io/ioutil"
"log"
"net/http"
)
var Product struct {
Locs []string `xml:"url>loc"`
Name []string `xml:"url>image:title"`
}
func main() {
res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
if err!=nil{
log.Fatal(err)
}
data, err := ioutil.ReadAll(res.Body)
if err!=nil{
log.Fatal(err)
}
defer res.Body.Close()
err = xml.Unmarshal(data, &Product)
if err!=nil{
log.Fatal(err)
}
for x, _ := range Product.Name {
fmt.Println(Product.Name[x], Product.Locs[x])
}
}
But this doesn't print anything. What am I doing wrong?
Here is the complete code https://play.golang.org/p/pZ6j4-lSEz on play.
答案1
得分: 3
请尝试以下代码,这段代码对我有效(注意:你也可以使用ioutil.ReadAll
和xml.Unmarshal
,而不是xml.Decode
):
package main
import (
"encoding/xml"
"fmt"
"log"
"net/http"
)
type URLSet struct {
XMLName string `xml:"urlset"`
URLs []URL `xml:"url"`
}
type URL struct {
Loc string `xml:"loc"`
Image Image `xml:"image"`
}
type Image struct {
Title string `xml:"title"`
}
func main() {
resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
if err != nil {
log.Fatalln(err)
}
defer resp.Body.Close()
var urlSet URLSet
if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
log.Fatalln(err)
}
for _, url := range urlSet.URLs {
fmt.Println(url.Loc, url.Image.Title)
}
}
这段代码可以从指定的URL获取XML数据,并解析为结构体。然后,它会遍历解析后的结构体,并打印每个URL和对应的标题。
英文:
Please try the following code which works for me (note: you could also use ioutil.ReadAll
and xml.Unmarshal
as you had before, instead of xml.Decode
):
package main
import (
"encoding/xml"
"fmt"
"log"
"net/http"
)
// <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
// <url>
// <loc>
// https://www.notre-shop.com/products/test-product-releasing-soon-2
// </loc>
// <lastmod>2017-01-17T08:04:44Z</lastmod>
// <changefreq>daily</changefreq>
// <image:image>
// <image:loc>
// https://cdn.shopify.com/s/files/1/0624/0605/products/NOTRE-CHICAGO-QK9C9548_fde37b05-495e-47b0-8dd1-b053c9ed3545.jpg?v=1481853712
// </image:loc>
// <image:title>Test Product Releasing Soon 2</image:title>
// </image:image>
// </url>
// </urlset>
type URLSet struct {
XMLName string `xml:"urlset"`
URLs []URL `xml:"url"`
}
type URL struct {
Loc string `xml:"loc"`
Image Image `xml:"image"`
}
type Image struct {
Title string `xml:"title"`
}
func main() {
resp, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
if err != nil {
log.Fatalln(err) // log.Fatal always exits the program, need to check err != nil first
}
defer resp.Body.Close()
var urlSet URLSet
if err = xml.NewDecoder(resp.Body).Decode(&urlSet); err != nil {
log.Fatalln(err)
}
for _, url := range urlSet.URLs {
fmt.Println(url.Loc, url.Image.Title)
}
}
答案2
得分: 0
这是XML规范中的内容:
XML规范中的"Namespaces in XML Recommendation"(XML命名空间推荐)为包含冒号字符的名称赋予了特殊含义。因此,作者在XML名称中除了用于命名空间目的之外,不应该使用冒号。但是XML处理器必须接受冒号作为名称字符。
这是XML命名空间推荐的内容:
XML命名空间推荐以与XML 1.0兼容的间接方式表示通用名称。实际上,XML命名空间推荐定义了一种从XML 1.0树到可以是通用名称的树的映射。该映射基于前缀的概念。如果元素类型名称或属性名称包含冒号,则映射将名称中冒号之前的部分视为前缀,将名称中冒号之后的部分视为本地名称。前缀"foo"引用xmlns:foo属性值中指定的URI。
你不能使用带有冒号(:)的名称来获取内部元素,而是可以避免使用前缀,这是重写后的代码:
对于性能/内存方面的考虑,由于你得到的是io.Reader
,你可以使用xml.Decoder
而不是xml.Unmarshal。
package main
import (
"encoding/xml"
"fmt"
"log"
"net/http"
)
var Product struct {
Locs []string `xml:"url>loc"`
Name []Image `xml:"url>image"`
}
type Image struct {
Title string `xml:"title"`
}
func main() {
res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
decoder := xml.NewDecoder(res.Body)
err = decoder.Decode(&Product)
if err != nil {
log.Fatal(err)
}
for x, _ := range Product.Name {
fmt.Println(Product.Name[x].Title, Product.Locs[x])
}
}
这是play链接:play
英文:
This is what the XML Specification says
> The Namespaces in XML Recommendation [XML Names] assigns a meaning
> to names containing colon characters. Therefore, authors should not
> use the colon in XML names except for namespace purposes, but XML
> processors must accept the colon as a name character.
This is the XML Namespace Recommendation says
> The XML Namespaces Recommendation expresses universal names in an
> indirect way that is compatible with XML 1.0. In effect the XML
> Namespaces Recommendation defines a mapping from an XML 1.0 tree where
> element type names and attribute names are local names into a tree
> where element type names and attribute names can be universal names.
> The mapping is based on the idea of a prefix. If an element type name
> or attribute name contains a colon, then the mapping treats the part
> of the name before the colon as a prefix, and the part of the name
> after the colon as the local name. A prefix foo refers to the URI
> specified in the value of the xmlns:foo attribute.
You may not use name with colon(:) to get the inner elements rather you can avoid the pefix ,here is your code re-written
And for performance/memory considerations since you are getting a io.Reader
you may use xml.Decoder
instead of xml.Unmarshal.
package main
import (
"encoding/xml"
"fmt"
"log"
"net/http"
)
var Product struct {
Locs []string `xml:"url>loc"`
Name []Image `xml:"url>image"`
}
type Image struct {
Title string `xml:"title"`
}
func main() {
res, err := http.Get("https://www.notre-shop.com/sitemap_products_1.xml")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
decoder := xml.NewDecoder(res.Body)
err = decoder.Decode(&Product)
if err != nil {
log.Fatal(err)
}
for x, _ := range Product.Name {
fmt.Println(Product.Name[x].Title, Product.Locs[x])
}
}
Here is play link play
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论