如何使用Golang解码Reddit的RSS?

huangapple go评论101阅读模式
英文:

How to decode Reddit's RSS using Golang?

问题

我一直在尝试使用Go的XML包,并且无法找出以下代码的问题所在。

package main

import (
    "encoding/xml"
    "fmt"
    "net/http"
)

type Channel struct {
    Items Item
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

func main() {

    var items = new(Channel)
    res, err := http.Get("http://www.reddit.com/r/google.xml")

    if err != nil {
        fmt.Printf("Error: %v\n", err)
    } else {
        decoded := xml.NewDecoder(res.Body)

        err = decoded.Decode(items)

        if err != nil {
            fmt.Printf("Error: %v\n", err)
        }

        fmt.Printf("Title: %s\n", items.Items.Title)
    }
}

以上代码运行没有任何错误,并在终端打印出:

Title:

结构体似乎是空的,但我无法理解为什么它没有被XML数据填充。

英文:

I've been playing about with Go's XML package and cannot see what is wrong with the following code.

package main

import (
    "encoding/xml"
    "fmt"
    "net/http"
) 

type Channel struct {
    Items Item
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

func main() {

    var items = new(Channel)
    res, err := http.Get("http://www.reddit.com/r/google.xml")

    if err != nil {
        fmt.Printf("Error: %v\n", err)
    } else {
        decoded := xml.NewDecoder(res.Body)

        err = decoded.Decode(items)

        if err != nil {
            fmt.Printf("Error: %v\n", err)
        }

        fmt.Printf("Title: %s\n", items.Items.Title)
    }
}

The above code runs without any errors and prints to the terminal:

Title:

The struct seems empty but I can't see why it isn't getting populated with the XML data.

答案1

得分: 5

我会完全明确地列出所有的 XML 部分

请参考这个完整的工作示例

type Rss struct {
    Channel Channel `xml:"channel"`
}

type Channel struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
    Items       []Item `xml:"item"`
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}
英文:

I'd be completely explicit like this - name all the XML parts

See the playground for a full working example

type Rss struct {
	Channel Channel `xml:"channel"`
}

type Channel struct {
	Title       string `xml:"title"`
	Link        string `xml:"link"`
	Description string `xml:"description"`
	Items       []Item `xml:"item"`
}

type Item struct {
	Title       string `xml:"title"`
	Link        string `xml:"link"`
	Description string `xml:"description"`
}

答案2

得分: 4

您的程序已经接近了,但需要指定更多的上下文来匹配XML文档。

您需要修改字段标签,以帮助将XML绑定引导到您的Channel结构和Item结构中:

type Channel struct {
    Items []Item `xml:"channel>item"`
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

根据encoding/xml.Unmarshal()的文档,第七个项目适用于这里:

如果XML元素包含一个子元素,其名称与格式为aa>b>c的标签的前缀匹配,Unmarshal将向下遍历XML结构,查找具有给定名称的元素,并将最内层的元素映射到该结构字段。以>开头的标签等效于以字段名称开头,后跟>

在您的情况下,您希望通过顶级<rss>元素的<channel>元素向下遍历,找到每个<item>元素。请注意,我们不需要(实际上也不能)指定Channel结构应该通过将Items字段的标签写为

xml:"rss>channel>item"`

这个上下文是隐含的;提供给Unmarshall()的结构体已经映射到顶级XML元素。

还要注意,您的Channel结构的Items字段应该是Item的切片类型,而不仅仅是单个Item

您提到您无法使提案工作。以下是一个完整的列表,我发现它按预期工作:

package main

import (
    "encoding/xml"
    "fmt"
    "net/http"
    "os"
)

type Channel struct {
    Items []Item `xml:"channel>item"`
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

func main() {
    if res, err := http.Get("http://www.reddit.com/r/google.xml"); err != nil {
        fmt.Println("Error retrieving resource:", err)
        os.Exit(1)
    } else {
        channel := Channel{}
        if err := xml.NewDecoder(res.Body).Decode(&channel); err != nil {
            fmt.Println("Error:", err)
            os.Exit(1)
        } else if len(channel.Items) != 0 {
            item := channel.Items[0]
            fmt.Println("First title:", item.Title)
            fmt.Println("First link:", item.Link)
            fmt.Println("First description:", item.Description)
        }
    }
}
英文:

Your program comes close, but needs to specify just a little bit more context to match the XML document.

You need to revise your field tags to help guide the XML binding down through your
Channel structure to your Item structure:

type Channel struct {
	Items []Item `xml:&quot;channel&gt;item&quot;`
}

type Item struct {
	Title       string `xml:&quot;title&quot;`
	Link        string `xml:&quot;link&quot;`
	Description string `xml:&quot;description&quot;`
}

Per the documentation for encoding/xml.Unmarshal(), the seventh bullet item applies here:

> If the XML element contains a sub-element whose name matches
the prefix of a tag formatted as "a" or "a>b>c", unmarshal
will descend into the XML structure looking for elements with the
given names, and will map the innermost elements to that struct
field. A tag starting with ">" is equivalent to one starting
with the field name followed by ">".

In your case, you're looking to descend through the top-level &lt;rss&gt; element's &lt;channel&gt; elements to find each &lt;item&gt; element. Note, though, that we don't need to—an in fact can't—specify that the Channel struct should burrow through the top-level &lt;rss&gt; element by writing the Items field's tag as

`xml:&quot;rss&gt;channel&gt;item&quot;`

That context is implicit; the struct supplied to Unmarshall() already maps to the top-level XML element.

Note too that your Channel struct's Items field should be of type slice-of-Item, not just a single Item.


You mentioned that you're having trouble getting the proposal to work. Here's a complete listing that I find works as one would expect:

package main

import (
    &quot;encoding/xml&quot;
    &quot;fmt&quot;
    &quot;net/http&quot;
	&quot;os&quot;
) 

type Channel struct {
    Items []Item `xml:&quot;channel&gt;item&quot;`
}

type Item struct {
    Title       string `xml:&quot;title&quot;`
    Link        string `xml:&quot;link&quot;`
    Description string `xml:&quot;description&quot;`
}

func main() {
    if res, err := http.Get(&quot;http://www.reddit.com/r/google.xml&quot;); err != nil {
        fmt.Println(&quot;Error retrieving resource:&quot;, err)
		os.Exit(1)
    } else {
		channel := Channel{}
        if err := xml.NewDecoder(res.Body).Decode(&amp;channel); err != nil {
            fmt.Println(&quot;Error:&quot;, err)
			os.Exit(1)
        } else if len(channel.Items) != 0 {
			item := channel.Items[0]
			fmt.Println(&quot;First title:&quot;, item.Title)
			fmt.Println(&quot;First link:&quot;, item.Link)
			fmt.Println(&quot;First description:&quot;, item.Description)
		}
    }
}

答案3

得分: 0

现在Reddit的RSS订阅源似乎已经更改为atom类型。这意味着常规解析将不再起作用。go-rss的atom功能可以解析这样的订阅源:

// RSS的Feed结构体
type Feed struct {
    Entry []Entry `xml:"entry"`
}

// Feed中每个Entry的结构体
type Entry struct {
    ID      string `xml:"id"`
    Title   string `xml:"title"`
    Updated string `xml:"updated"`
}

// Atom函数解析atom订阅源
func Atom(resp *http.Response) (*Feed, error) {
    defer resp.Body.Close()
    xmlDecoder := xml.NewDecoder(resp.Body)
    xmlDecoder.CharsetReader = charset.NewReader
    feed := Feed{}
    if err := xmlDecoder.Decode(&feed); err != nil {
        return nil, err
    }
    return &feed, nil
}
英文:

Nowadays the Reddit RSS feed seem to be have changed to the atom type. This means that regular parsing will not work anymore. The atom functionality of go-rss could parse such feeds:

> //Feed struct for RSS
> type Feed struct {
> Entry []Entry xml:&quot;entry&quot;
> }
>
> //Entry struct for each Entry in the Feed
> type Entry struct {
> ID string xml:&quot;id&quot;
> Title string xml:&quot;title&quot;
> Updated string xml:&quot;updated&quot;
> }
>
> //Atom parses atom feeds
> func Atom(resp *http.Response) (*Feed, error) {
> defer resp.Body.Close()
> xmlDecoder := xml.NewDecoder(resp.Body)
> xmlDecoder.CharsetReader = charset.NewReader
> feed := Feed{}
> if err := xmlDecoder.Decode(&feed); err != nil {
> return nil, err
> }
> return &feed, nil
> }

huangapple
  • 本文由 发表于 2013年9月30日 02:08:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/19081479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定