2013年9月30日 02:08:39go评论101阅读模式

英文:

How to decode Reddit's RSS using Golang?

问题

我一直在尝试使用Go的XML包，并且无法找出以下代码的问题所在。

package main

import (
    "encoding/xml"
    "fmt"
    "net/http"
)

type Channel struct {
    Items Item
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

func main() {

    var items = new(Channel)
    res, err := http.Get("http://www.reddit.com/r/google.xml")

    if err != nil {
        fmt.Printf("Error: %v\n", err)
    } else {
        decoded := xml.NewDecoder(res.Body)

        err = decoded.Decode(items)

        if err != nil {
            fmt.Printf("Error: %v\n", err)
        }

        fmt.Printf("Title: %s\n", items.Items.Title)
    }
}

以上代码运行没有任何错误，并在终端打印出：

Title:

结构体似乎是空的，但我无法理解为什么它没有被XML数据填充。

英文:

I've been playing about with Go's XML package and cannot see what is wrong with the following code.

package main

import (
    &quot;encoding/xml&quot;
    &quot;fmt&quot;
    &quot;net/http&quot;
) 

type Channel struct {
    Items Item
}

type Item struct {
    Title       string `xml:&quot;title&quot;`
    Link        string `xml:&quot;link&quot;`
    Description string `xml:&quot;description&quot;`
}

func main() {

    var items = new(Channel)
    res, err := http.Get(&quot;http://www.reddit.com/r/google.xml&quot;)

    if err != nil {
        fmt.Printf(&quot;Error: %v\n&quot;, err)
    } else {
        decoded := xml.NewDecoder(res.Body)

        err = decoded.Decode(items)

        if err != nil {
            fmt.Printf(&quot;Error: %v\n&quot;, err)
        }

        fmt.Printf(&quot;Title: %s\n&quot;, items.Items.Title)
    }
}

The above code runs without any errors and prints to the terminal:

Title:

The struct seems empty but I can't see why it isn't getting populated with the XML data.

答案1

得分: 5

我会完全明确地列出所有的 XML 部分

请参考这个完整的工作示例

type Rss struct {
    Channel Channel `xml:"channel"`
}

type Channel struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
    Items       []Item `xml:"item"`
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

英文:

I'd be completely explicit like this - name all the XML parts

See the playground for a full working example

type Rss struct {
	Channel Channel `xml:&quot;channel&quot;`
}

type Channel struct {
	Title       string `xml:&quot;title&quot;`
	Link        string `xml:&quot;link&quot;`
	Description string `xml:&quot;description&quot;`
	Items       []Item `xml:&quot;item&quot;`
}

type Item struct {
	Title       string `xml:&quot;title&quot;`
	Link        string `xml:&quot;link&quot;`
	Description string `xml:&quot;description&quot;`
}

答案2

得分: 4

您的程序已经接近了，但需要指定更多的上下文来匹配XML文档。

您需要修改字段标签，以帮助将XML绑定引导到您的Channel结构和Item结构中：

type Channel struct {
    Items []Item `xml:"channel>item"`
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

根据encoding/xml.Unmarshal()的文档，第七个项目适用于这里：

如果XML元素包含一个子元素，其名称与格式为a或a>b>c的标签的前缀匹配，Unmarshal将向下遍历XML结构，查找具有给定名称的元素，并将最内层的元素映射到该结构字段。以>开头的标签等效于以字段名称开头，后跟>。

在您的情况下，您希望通过顶级<rss>元素的<channel>元素向下遍历，找到每个<item>元素。请注意，我们不需要（实际上也不能）指定Channel结构应该通过将Items字段的标签写为

xml:"rss>channel>item"`

这个上下文是隐含的；提供给Unmarshall()的结构体已经映射到顶级XML元素。

还要注意，您的Channel结构的Items字段应该是Item的切片类型，而不仅仅是单个Item。

您提到您无法使提案工作。以下是一个完整的列表，我发现它按预期工作：

package main

import (
    "encoding/xml"
    "fmt"
    "net/http"
    "os"
)

type Channel struct {
    Items []Item `xml:"channel>item"`
}

type Item struct {
    Title       string `xml:"title"`
    Link        string `xml:"link"`
    Description string `xml:"description"`
}

func main() {
    if res, err := http.Get("http://www.reddit.com/r/google.xml"); err != nil {
        fmt.Println("Error retrieving resource:", err)
        os.Exit(1)
    } else {
        channel := Channel{}
        if err := xml.NewDecoder(res.Body).Decode(&channel); err != nil {
            fmt.Println("Error:", err)
            os.Exit(1)
        } else if len(channel.Items) != 0 {
            item := channel.Items[0]
            fmt.Println("First title:", item.Title)
            fmt.Println("First link:", item.Link)
            fmt.Println("First description:", item.Description)
        }
    }
}

英文:

Your program comes close, but needs to specify just a little bit more context to match the XML document.

You need to revise your field tags to help guide the XML binding down through your
Channel structure to your Item structure:

type Channel struct {
	Items []Item `xml:&quot;channel&gt;item&quot;`
}

type Item struct {
	Title       string `xml:&quot;title&quot;`
	Link        string `xml:&quot;link&quot;`
	Description string `xml:&quot;description&quot;`
}

Per the documentation for encoding/xml.Unmarshal(), the seventh bullet item applies here:

> If the XML element contains a sub-element whose name matches
the prefix of a tag formatted as "a" or "a>b>c", unmarshal
will descend into the XML structure looking for elements with the
given names, and will map the innermost elements to that struct
field. A tag starting with ">" is equivalent to one starting
with the field name followed by ">".

In your case, you're looking to descend through the top-level <rss> element's <channel> elements to find each <item> element. Note, though, that we don't need to—an in fact can't—specify that the Channel struct should burrow through the top-level <rss> element by writing the Items field's tag as

`xml:&quot;rss&gt;channel&gt;item&quot;`

That context is implicit; the struct supplied to Unmarshall() already maps to the top-level XML element.

Note too that your Channel struct's Items field should be of type slice-of-Item, not just a single Item.

You mentioned that you're having trouble getting the proposal to work. Here's a complete listing that I find works as one would expect:

package main

import (
    &quot;encoding/xml&quot;
    &quot;fmt&quot;
    &quot;net/http&quot;
	&quot;os&quot;
) 

type Channel struct {
    Items []Item `xml:&quot;channel&gt;item&quot;`
}

type Item struct {
    Title       string `xml:&quot;title&quot;`
    Link        string `xml:&quot;link&quot;`
    Description string `xml:&quot;description&quot;`
}

func main() {
    if res, err := http.Get(&quot;http://www.reddit.com/r/google.xml&quot;); err != nil {
        fmt.Println(&quot;Error retrieving resource:&quot;, err)
		os.Exit(1)
    } else {
		channel := Channel{}
        if err := xml.NewDecoder(res.Body).Decode(&amp;channel); err != nil {
            fmt.Println(&quot;Error:&quot;, err)
			os.Exit(1)
        } else if len(channel.Items) != 0 {
			item := channel.Items[0]
			fmt.Println(&quot;First title:&quot;, item.Title)
			fmt.Println(&quot;First link:&quot;, item.Link)
			fmt.Println(&quot;First description:&quot;, item.Description)
		}
    }
}

答案3

得分: 0

现在Reddit的RSS订阅源似乎已经更改为atom类型。这意味着常规解析将不再起作用。go-rss的atom功能可以解析这样的订阅源：

// RSS的Feed结构体
type Feed struct {
    Entry []Entry `xml:"entry"`
}

// Feed中每个Entry的结构体
type Entry struct {
    ID      string `xml:"id"`
    Title   string `xml:"title"`
    Updated string `xml:"updated"`
}

// Atom函数解析atom订阅源
func Atom(resp *http.Response) (*Feed, error) {
    defer resp.Body.Close()
    xmlDecoder := xml.NewDecoder(resp.Body)
    xmlDecoder.CharsetReader = charset.NewReader
    feed := Feed{}
    if err := xmlDecoder.Decode(&feed); err != nil {
        return nil, err
    }
    return &feed, nil
}

英文:

Nowadays the Reddit RSS feed seem to be have changed to the atom type. This means that regular parsing will not work anymore. The atom functionality of go-rss could parse such feeds:

> //Feed struct for RSS
> type Feed struct {
> Entry []Entry xml:"entry"
> }
>
> //Entry struct for each Entry in the Feed
> type Entry struct {
> ID string xml:"id"
> Title string xml:"title"
> Updated string xml:"updated"
> }
>
> //Atom parses atom feeds
> func Atom(resp *http.Response) (*Feed, error) {
> defer resp.Body.Close()
> xmlDecoder := xml.NewDecoder(resp.Body)
> xmlDecoder.CharsetReader = charset.NewReader
> feed := Feed{}
> if err := xmlDecoder.Decode(&feed); err != nil {
> return nil, err
> }
> return &feed, nil
> }

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Golang解码Reddit的RSS？

问题

答案1

答案2

答案3

Golang结构体的双向绑定

当切片长度超过700个元素时，Golang函数返回空切片。

如何知道符合Go错误的变量名称

无法使用AWS SDK在Go中获取s3.Object的ACL。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论