英文:
XML parsing with Golang (abuse of entity attributes)
问题
我有一个XML格式的数据,来自数据库的转储:
<table name="table1">
<row>
<col name="col1">value</col>
<col name="col2">value</col>
<col name="col3">value</col>
</row>
...
</table>
<table name="table2">
<row>
<col name="col1">value</col>
<col name="col2">value</col>
<col name="col3">value</col>
<col name="col4">value</col>
</row>
...
</table>
我正在尝试使用Go中的xml.Decode()
函数解析它,但我无法处理这样的XML。
我尝试匹配一些在Golang中解析XML的示例,但没有适用于这种带有各种实体的name
属性的格式。
更新
谢谢你们的回答!我想要实现的是:假设我有一个名为"users"的表和一个名为"categories"的表,我想从这个XML中创建尽可能多的User
和Category
对象。
根据你们的第一个回答,我可以轻松地使用解析的Table
对象,然后创建自己的对象,但我想知道是否可以跳过这一步,直接解码我的对象。
英文:
I have an XML formatted like this, coming from a dump of a database
<table name="table1">
<row>
<col name="col1">value</col>
<col name="col2">value</col>
<col name="col3">value</col>
</row>
...
</table>
<table name="table2">
<row>
<col name="col1">value</col>
<col name="col2">value</col>
<col name="col3">value</col>
<col name="col4">value</col>
</row>
...
</table>
I am trying to parse this in Go with xml.Decode()
function but I can't handle such an XML.
I tried to match some of the examples found googling XML parsing in Golang, but nothing suits this kind of format (with name
attributes all over the entities).
Update
Thanks for the answers! What I'd like to achieve is: let's say I have a Table called "users" and a table called "categories", I'd like to create as many User
and Category
objects starting from that XML.
Based on your first answers I can easily use the Table
objects parsed and then create my own objects, but I'd like to know if I can skip that and decode directly my objects instead.
答案1
得分: 1
我不知道你正在处理什么样的问题,但是你的XML实际上看起来像一串XML值流,所以你应该将其解码为如下结构:
type Table struct {
Name string `xml:"name,attr"`
Rows []Row `xml:"row"`
}
type Row struct {
Cols []Col `xml:"col"`
}
type Col struct {
Name string `xml:"name,attr"`
Value string `xml:",chardata"`
}
//...
var err error
dec := xml.NewDecoder(bytes.NewReader(b))
for err == nil {
t := Table{}
err = dec.Decode(&t)
fmt.Printf("%v, %+v\n", err, t)
}
Playground: https://play.golang.org/p/w4PqLv-Uc0
编辑: 如果你想直接将这样的XML解码为适当的结构体,你将需要更复杂的机制。这里有一个关于用户的示例:
type User struct {
ID int
Name string
Surname string
}
func (u *User) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
var err error
type col struct {
Name string `xml:"name,attr"`
Value string `xml:",chardata"`
}
for {
c := col{}
err = d.Decode(&c)
if err != nil {
if err == io.EOF {
break
}
return err
}
switch c.Name {
case "id":
u.ID, err = strconv.Atoi(c.Value)
if err != nil {
return err
}
case "name":
u.Name = c.Value
continue
case "surname":
u.Surname = c.Value
}
}
return nil
}
type UserTable struct {
Users []User `xml:"row"`
}
Playground: https://play.golang.org/p/Hqta3Ngjo3。
英文:
I don't know what kind of issue you're dealing with, but your XML actually looks like a stream of XML values, so you should decode it as such:
type Table struct {
Name string `xml:"name,attr"`
Rows []Row `xml:"row"`
}
type Row struct {
Cols []Col `xml:"col"`
}
type Col struct {
Name string `xml:"name,attr"`
Value string `xml:",chardata"`
}
//...
var err error
dec := xml.NewDecoder(bytes.NewReader(b))
for err == nil {
t := Table{}
err = dec.Decode(&t)
fmt.Printf("%v, %+v\n", err, t)
}
Playground: https://play.golang.org/p/w4PqLv-Uc0.
EDIT: If you want to decode such XML directly into appropriate structs, you'll need a more complex machinery. Here's an example for users:
type User struct {
ID int
Name string
Surname string
}
func (u *User) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
var err error
type col struct {
Name string `xml:"name,attr"`
Value string `xml:",chardata"`
}
for {
c := col{}
err = d.Decode(&c)
if err != nil {
if err == io.EOF {
break
}
return err
}
switch c.Name {
case "id":
u.ID, err = strconv.Atoi(c.Value)
if err != nil {
return err
}
case "name":
u.Name = c.Value
continue
case "surname":
u.Surname = c.Value
}
}
return nil
}
type UserTable struct {
Users []User `xml:"row"`
}
Playground: https://play.golang.org/p/Hqta3Ngjo3.
答案2
得分: 1
xml.Unmarshal()
函数的文档描述了 XML 文档和结构体之间的映射关系(补充了 json.Marshal()
的文档)。如果你对结构体标签不熟悉,可以查看这个答案:https://stackoverflow.com/questions/10858787/what-are-the-uses-for-tags-in-go/30889373#30889373
首先,你需要对 XML 文档进行建模。请注意,如果你没有一个包装所有其他元素的 XML 元素,那么不是一个 XML 文档,而是多个,就像你的情况一样。你有多个 <table>
文档。
你可以这样对它们进行建模:
type Col struct {
Name string `xml:"name,attr"`
Value string `xml:",chardata"`
}
type Row struct {
Cols []Col `xml:"col"`
}
type Table struct {
Name string `xml:"name,attr"`
Rows []Row `xml:"row"`
}
由于你有多个 <table>
元素,最简单的方法是创建一个 xml.Decoder()
,并使用 Decoder.Decode()
方法分别解析每个元素(Decoder.Decode()
会尝试从源读取器解析一个 XML 文档)。
下面是具体的实现方法:
d := xml.NewDecoder(strings.NewReader(data))
for {
var table Table
if err := d.Decode(&table); err != nil {
fmt.Println(err)
break
}
fmt.Printf("%+v\n", table)
}
当源字符串中没有更多数据时,Decoder.Decode()
将报告 io.EOF
。使用以下 data
字符串运行上述代码:
const data = `<table name="table1">
<row>
<col name="col1">value1</col>
<col name="col2">value2</col>
<col name="col3">value3</col>
</row>
<row>
<col name="col1">value4</col>
<col name="col2">value5</col>
<col name="col3">value6</col>
</row>
</table>
<table name="table2">
<row>
<col name="col1">value7</col>
<col name="col2">value8</col>
<col name="col3">value9</col>
<col name="col4">valueA</col>
</row>
</table>`
输出结果为:
{Name:table1 Rows:[{Cols:[{Name:col1 Value:value1} {Name:col2 Value:value2} {Name:col3 Value:value3}]} {Cols:[{Name:col1 Value:value4} {Name:col2 Value:value5} {Name:col3 Value:value6}]}]}
{Name:table2 Rows:[{Cols:[{Name:col1 Value:value7} {Name:col2 Value:value8} {Name:col3 Value:value9} {Name:col4 Value:valueA}]}]}
EOF
你可以在 Go Playground 上尝试运行它。
英文:
The doc of the xml.Unmarshal()
function describes how the mapping between XML documents and structs work (supplemented by doc of json.Marshal()
). If you're not familiar with struct tags, check out this answer: https://stackoverflow.com/questions/10858787/what-are-the-uses-for-tags-in-go/30889373#30889373
First you need to model the XML document. Note that if you don't have a wrapper XML element around all the others, that is not 1 XML document but multiple, such as your case. You have multiple <table>
documents.
You can model them like this:
type Col struct {
Name string `xml:"name,attr"`
Value string `xml:",chardata"`
}
type Row struct {
Cols []Col `xml:"col"`
}
type Table struct {
Name string `xml:"name,attr"`
Rows []Row `xml:"row"`
}
And since you have multiple <table>
elements, easiest is to create an xml.Decoder()
and parse each separately with a Decoder.Decode()
call (Decoder.Decode()
will attempt to parse 1 XML document from its source reader).
This is how it can be done:
d := xml.NewDecoder(strings.NewReader(data))
for {
var table Table
if err := d.Decode(&table); err != nil {
fmt.Println(err)
break
}
fmt.Printf("%+v\n", table)
}
When there are no more data in the source string
, Decoder.Decode()
will report io.EOF
. Running the above cod with the following data
string:
const data = `<table name="table1">
<row>
<col name="col1">value1</col>
<col name="col2">value2</col>
<col name="col3">value3</col>
</row>
<row>
<col name="col1">value4</col>
<col name="col2">value5</col>
<col name="col3">value6</col>
</row>
</table>
<table name="table2">
<row>
<col name="col1">value7</col>
<col name="col2">value8</col>
<col name="col3">value9</col>
<col name="col4">valueA</col>
</row>
</table>`
The output is:
{Name:table1 Rows:[{Cols:[{Name:col1 Value:value1} {Name:col2 Value:value2} {Name:col3 Value:value3}]} {Cols:[{Name:col1 Value:value4} {Name:col2 Value:value5} {Name:col3 Value:value6}]}]}
{Name:table2 Rows:[{Cols:[{Name:col1 Value:value7} {Name:col2 Value:value8} {Name:col3 Value:value9} {Name:col4 Value:valueA}]}]}
EOF
Try it on the Go Playground.
答案3
得分: 1
要根据名称将每个字段解组为不同的类型,您需要获取每个起始元素标记,检查名称,然后根据需要进行解码。类似以下代码(其中getName
是返回name
属性的函数;请参考playground链接获取一个最小工作示例):
d := xml.NewDecoder(…)
for {
tok, err := d.Token()
if err != nil {
log.Fatal(err)
}
start, ok := tok.(xml.StartElement)
if !ok {
// 这是字符数据或其他令牌。
// 如果它是无效的 XML,将在下一次调用 Token() 时捕获到。
continue
}
switch getName(start) {
case "user":
u := &User{}
err = d.DecodeElement(u, &start)
if err != nil {
log.Fatal(err)
}
// 在这里对结构进行任何操作。
fmt.Printf("解码了一个用户:%+v\n", u)
case "category":
u := &Category{}
err = d.DecodeElement(u, &start)
if err != nil {
log.Fatal(err)
}
fmt.Printf("解码了一个类别:%+v\n", u)
default:
// 如果名称无法识别,则跳过元素的剩余部分。
// 我们也可以返回一个错误。
d.Skip()
}
}
链接:https://play.golang.org/p/l7Vmj_8Igp
英文:
To unmarshal each field into a different type based on the name, you'll need to get each start element token, check the name, and then decode as appropriate. Something like the following (where getName
is a function that returns the name
attribute; see the playground link for a minimal working example):
d := xml.NewDecoder(…)
for {
tok, err := d.Token()
if err != nil {
log.Fatal(err)
}
start, ok := tok.(xml.StartElement)
if !ok {
// This is chardata or some other token.
// If it's bad XML, it will be caught on the next call to Token()
continue
}
switch getName(start) {
case "user":
u := &User{}
err = d.DecodeElement(u, &start)
if err != nil {
log.Fatal(err)
}
// Do whatever you want to do with your struct here.
fmt.Printf("Decoded a user: %+v\n", u)
case "category":
u := &Category{}
err = d.DecodeElement(u, &start)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Decoded a category: %+v\n", u)
default:
// If the name is unrecognized, skip the rest of the element.
// We could also return an error
d.Skip()
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论