英文:
How to decode Reddit's RSS using Golang?
问题
我一直在尝试使用Go的XML包,并且无法找出以下代码的问题所在。
package main
import (
"encoding/xml"
"fmt"
"net/http"
)
type Channel struct {
Items Item
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
func main() {
var items = new(Channel)
res, err := http.Get("http://www.reddit.com/r/google.xml")
if err != nil {
fmt.Printf("Error: %v\n", err)
} else {
decoded := xml.NewDecoder(res.Body)
err = decoded.Decode(items)
if err != nil {
fmt.Printf("Error: %v\n", err)
}
fmt.Printf("Title: %s\n", items.Items.Title)
}
}
以上代码运行没有任何错误,并在终端打印出:
Title:
结构体似乎是空的,但我无法理解为什么它没有被XML数据填充。
英文:
I've been playing about with Go's XML package and cannot see what is wrong with the following code.
package main
import (
"encoding/xml"
"fmt"
"net/http"
)
type Channel struct {
Items Item
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
func main() {
var items = new(Channel)
res, err := http.Get("http://www.reddit.com/r/google.xml")
if err != nil {
fmt.Printf("Error: %v\n", err)
} else {
decoded := xml.NewDecoder(res.Body)
err = decoded.Decode(items)
if err != nil {
fmt.Printf("Error: %v\n", err)
}
fmt.Printf("Title: %s\n", items.Items.Title)
}
}
The above code runs without any errors and prints to the terminal:
Title:
The struct seems empty but I can't see why it isn't getting populated with the XML data.
答案1
得分: 5
我会完全明确地列出所有的 XML 部分
type Rss struct {
Channel Channel `xml:"channel"`
}
type Channel struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
Items []Item `xml:"item"`
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
英文:
I'd be completely explicit like this - name all the XML parts
See the playground for a full working example
type Rss struct {
Channel Channel `xml:"channel"`
}
type Channel struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
Items []Item `xml:"item"`
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
答案2
得分: 4
您的程序已经接近了,但需要指定更多的上下文来匹配XML文档。
您需要修改字段标签,以帮助将XML绑定引导到您的Channel
结构和Item
结构中:
type Channel struct {
Items []Item `xml:"channel>item"`
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
根据encoding/xml.Unmarshal()
的文档,第七个项目适用于这里:
如果XML元素包含一个子元素,其名称与格式为
a
或a>b>c
的标签的前缀匹配,Unmarshal
将向下遍历XML结构,查找具有给定名称的元素,并将最内层的元素映射到该结构字段。以>
开头的标签等效于以字段名称开头,后跟>
。
在您的情况下,您希望通过顶级<rss>
元素的<channel>
元素向下遍历,找到每个<item>
元素。请注意,我们不需要(实际上也不能)指定Channel
结构应该通过将Items
字段的标签写为
xml:"rss>channel>item"`
这个上下文是隐含的;提供给Unmarshall()
的结构体已经映射到顶级XML元素。
还要注意,您的Channel
结构的Items
字段应该是Item
的切片类型,而不仅仅是单个Item
。
您提到您无法使提案工作。以下是一个完整的列表,我发现它按预期工作:
package main
import (
"encoding/xml"
"fmt"
"net/http"
"os"
)
type Channel struct {
Items []Item `xml:"channel>item"`
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
func main() {
if res, err := http.Get("http://www.reddit.com/r/google.xml"); err != nil {
fmt.Println("Error retrieving resource:", err)
os.Exit(1)
} else {
channel := Channel{}
if err := xml.NewDecoder(res.Body).Decode(&channel); err != nil {
fmt.Println("Error:", err)
os.Exit(1)
} else if len(channel.Items) != 0 {
item := channel.Items[0]
fmt.Println("First title:", item.Title)
fmt.Println("First link:", item.Link)
fmt.Println("First description:", item.Description)
}
}
}
英文:
Your program comes close, but needs to specify just a little bit more context to match the XML document.
You need to revise your field tags to help guide the XML binding down through your
Channel
structure to your Item
structure:
type Channel struct {
Items []Item `xml:"channel>item"`
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
Per the documentation for encoding/xml.Unmarshal()
, the seventh bullet item applies here:
> If the XML element contains a sub-element whose name matches
the prefix of a tag formatted as "a" or "a>b>c", unmarshal
will descend into the XML structure looking for elements with the
given names, and will map the innermost elements to that struct
field. A tag starting with ">" is equivalent to one starting
with the field name followed by ">".
In your case, you're looking to descend through the top-level <rss>
element's <channel>
elements to find each <item>
element. Note, though, that we don't need to—an in fact can't—specify that the Channel
struct should burrow through the top-level <rss>
element by writing the Items
field's tag as
`xml:"rss>channel>item"`
That context is implicit; the struct supplied to Unmarshall()
already maps to the top-level XML element.
Note too that your Channel
struct's Items
field should be of type slice-of-Item
, not just a single Item
.
You mentioned that you're having trouble getting the proposal to work. Here's a complete listing that I find works as one would expect:
package main
import (
"encoding/xml"
"fmt"
"net/http"
"os"
)
type Channel struct {
Items []Item `xml:"channel>item"`
}
type Item struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
func main() {
if res, err := http.Get("http://www.reddit.com/r/google.xml"); err != nil {
fmt.Println("Error retrieving resource:", err)
os.Exit(1)
} else {
channel := Channel{}
if err := xml.NewDecoder(res.Body).Decode(&channel); err != nil {
fmt.Println("Error:", err)
os.Exit(1)
} else if len(channel.Items) != 0 {
item := channel.Items[0]
fmt.Println("First title:", item.Title)
fmt.Println("First link:", item.Link)
fmt.Println("First description:", item.Description)
}
}
}
答案3
得分: 0
现在Reddit的RSS订阅源似乎已经更改为atom
类型。这意味着常规解析将不再起作用。go-rss的atom功能可以解析这样的订阅源:
// RSS的Feed结构体
type Feed struct {
Entry []Entry `xml:"entry"`
}
// Feed中每个Entry的结构体
type Entry struct {
ID string `xml:"id"`
Title string `xml:"title"`
Updated string `xml:"updated"`
}
// Atom函数解析atom订阅源
func Atom(resp *http.Response) (*Feed, error) {
defer resp.Body.Close()
xmlDecoder := xml.NewDecoder(resp.Body)
xmlDecoder.CharsetReader = charset.NewReader
feed := Feed{}
if err := xmlDecoder.Decode(&feed); err != nil {
return nil, err
}
return &feed, nil
}
英文:
Nowadays the Reddit RSS feed seem to be have changed to the atom
type. This means that regular parsing will not work anymore. The atom functionality of go-rss could parse such feeds:
> //Feed struct for RSS
> type Feed struct {
> Entry []Entry xml:"entry"
> }
>
> //Entry struct for each Entry in the Feed
> type Entry struct {
> ID string xml:"id"
> Title string xml:"title"
> Updated string xml:"updated"
> }
>
> //Atom parses atom feeds
> func Atom(resp *http.Response) (*Feed, error) {
> defer resp.Body.Close()
> xmlDecoder := xml.NewDecoder(resp.Body)
> xmlDecoder.CharsetReader = charset.NewReader
> feed := Feed{}
> if err := xmlDecoder.Decode(&feed); err != nil {
> return nil, err
> }
> return &feed, nil
> }
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论