使用Golang进行XML解析(滥用实体属性)

huangapple go评论73阅读模式
英文:

XML parsing with Golang (abuse of entity attributes)

问题

我有一个XML格式的数据,来自数据库的转储:

<table name="table1">
    <row>
        <col name="col1">value</col>
        <col name="col2">value</col>
        <col name="col3">value</col>
    </row>
   ...
</table>
<table name="table2">
    <row>
        <col name="col1">value</col>
        <col name="col2">value</col>
        <col name="col3">value</col>
        <col name="col4">value</col>
    </row>
   ...
</table>

我正在尝试使用Go中的xml.Decode()函数解析它,但我无法处理这样的XML。
我尝试匹配一些在Golang中解析XML的示例,但没有适用于这种带有各种实体的name属性的格式。

更新

谢谢你们的回答!我想要实现的是:假设我有一个名为"users"的表和一个名为"categories"的表,我想从这个XML中创建尽可能多的UserCategory对象。
根据你们的第一个回答,我可以轻松地使用解析的Table对象,然后创建自己的对象,但我想知道是否可以跳过这一步,直接解码我的对象。

英文:

I have an XML formatted like this, coming from a dump of a database

&lt;table name=&quot;table1&quot;&gt;
    &lt;row&gt;
        &lt;col name=&quot;col1&quot;&gt;value&lt;/col&gt;
        &lt;col name=&quot;col2&quot;&gt;value&lt;/col&gt;
        &lt;col name=&quot;col3&quot;&gt;value&lt;/col&gt;
    &lt;/row&gt;
   ...
&lt;/table&gt;
&lt;table name=&quot;table2&quot;&gt;
    &lt;row&gt;
        &lt;col name=&quot;col1&quot;&gt;value&lt;/col&gt;
        &lt;col name=&quot;col2&quot;&gt;value&lt;/col&gt;
        &lt;col name=&quot;col3&quot;&gt;value&lt;/col&gt;
        &lt;col name=&quot;col4&quot;&gt;value&lt;/col&gt;
    &lt;/row&gt;
   ...
&lt;/table&gt;

I am trying to parse this in Go with xml.Decode() function but I can't handle such an XML.
I tried to match some of the examples found googling XML parsing in Golang, but nothing suits this kind of format (with name attributes all over the entities).

Update

Thanks for the answers! What I'd like to achieve is: let's say I have a Table called "users" and a table called "categories", I'd like to create as many User and Category objects starting from that XML.
Based on your first answers I can easily use the Table objects parsed and then create my own objects, but I'd like to know if I can skip that and decode directly my objects instead.

答案1

得分: 1

我不知道你正在处理什么样的问题,但是你的XML实际上看起来像一串XML值流,所以你应该将其解码为如下结构:

type Table struct {
    Name string `xml:"name,attr"`
    Rows []Row  `xml:"row"`
}

type Row struct {
    Cols []Col `xml:"col"`
}

type Col struct {
    Name  string `xml:"name,attr"`
    Value string `xml:",chardata"`
}

//...

var err error
dec := xml.NewDecoder(bytes.NewReader(b))
for err == nil {
    t := Table{}
    err = dec.Decode(&t)
    fmt.Printf("%v, %+v\n", err, t)
}

Playground: https://play.golang.org/p/w4PqLv-Uc0

编辑: 如果你想直接将这样的XML解码为适当的结构体,你将需要更复杂的机制。这里有一个关于用户的示例:

type User struct {
    ID      int
    Name    string
    Surname string
}

func (u *User) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    var err error

    type col struct {
        Name  string `xml:"name,attr"`
        Value string `xml:",chardata"`
    }

    for {
        c := col{}
        err = d.Decode(&c)
        if err != nil {
            if err == io.EOF {
                break
            }

            return err
        }
        switch c.Name {
        case "id":
            u.ID, err = strconv.Atoi(c.Value)
            if err != nil {
                return err
            }
        case "name":
            u.Name = c.Value
            continue
        case "surname":
            u.Surname = c.Value
        }
    }

    return nil
}

type UserTable struct {
    Users []User `xml:"row"`
}

Playground: https://play.golang.org/p/Hqta3Ngjo3

英文:

I don't know what kind of issue you're dealing with, but your XML actually looks like a stream of XML values, so you should decode it as such:

type Table struct {
	Name string `xml:&quot;name,attr&quot;`
	Rows []Row  `xml:&quot;row&quot;`
}

type Row struct {
	Cols []Col `xml:&quot;col&quot;`
}

type Col struct {
	Name  string `xml:&quot;name,attr&quot;`
	Value string `xml:&quot;,chardata&quot;`
}

//...

var err error
dec := xml.NewDecoder(bytes.NewReader(b))
for err == nil {
	t := Table{}
	err = dec.Decode(&amp;t)
	fmt.Printf(&quot;%v, %+v\n&quot;, err, t)
}

Playground: https://play.golang.org/p/w4PqLv-Uc0.


EDIT: If you want to decode such XML directly into appropriate structs, you'll need a more complex machinery. Here's an example for users:

type User struct {
	ID      int
	Name    string
	Surname string
}

func (u *User) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	var err error

	type col struct {
		Name  string `xml:&quot;name,attr&quot;`
		Value string `xml:&quot;,chardata&quot;`
	}

	for {
		c := col{}
		err = d.Decode(&amp;c)
		if err != nil {
			if err == io.EOF {
				break
			}

			return err
		}
		switch c.Name {
		case &quot;id&quot;:
			u.ID, err = strconv.Atoi(c.Value)
			if err != nil {
				return err
			}
		case &quot;name&quot;:
			u.Name = c.Value
			continue
		case &quot;surname&quot;:
			u.Surname = c.Value
		}
	}

	return nil
}

type UserTable struct {
	Users []User `xml:&quot;row&quot;`
}

Playground: https://play.golang.org/p/Hqta3Ngjo3.

答案2

得分: 1

xml.Unmarshal() 函数的文档描述了 XML 文档和结构体之间的映射关系(补充了 json.Marshal() 的文档)。如果你对结构体标签不熟悉,可以查看这个答案:https://stackoverflow.com/questions/10858787/what-are-the-uses-for-tags-in-go/30889373#30889373

首先,你需要对 XML 文档进行建模。请注意,如果你没有一个包装所有其他元素的 XML 元素,那么不是一个 XML 文档,而是多个,就像你的情况一样。你有多个 &lt;table&gt; 文档。

你可以这样对它们进行建模:

type Col struct {
    Name  string `xml:"name,attr"`
    Value string `xml:",chardata"`
}

type Row struct {
    Cols []Col `xml:"col"`
}

type Table struct {
    Name string `xml:"name,attr"`
    Rows []Row  `xml:"row"`
}

由于你有多个 &lt;table&gt; 元素,最简单的方法是创建一个 xml.Decoder(),并使用 Decoder.Decode() 方法分别解析每个元素(Decoder.Decode() 会尝试从源读取器解析一个 XML 文档)。

下面是具体的实现方法:

d := xml.NewDecoder(strings.NewReader(data))
for {
    var table Table
    if err := d.Decode(&table); err != nil {
        fmt.Println(err)
        break
    }
    fmt.Printf("%+v\n", table)
}

当源字符串中没有更多数据时,Decoder.Decode() 将报告 io.EOF。使用以下 data 字符串运行上述代码:

const data = `<table name="table1">
    <row>
        <col name="col1">value1</col>
        <col name="col2">value2</col>
        <col name="col3">value3</col>
    </row>
    <row>
        <col name="col1">value4</col>
        <col name="col2">value5</col>
        <col name="col3">value6</col>
    </row>
</table>
<table name="table2">
    <row>
        <col name="col1">value7</col>
        <col name="col2">value8</col>
        <col name="col3">value9</col>
        <col name="col4">valueA</col>
    </row>
</table>`

输出结果为:

{Name:table1 Rows:[{Cols:[{Name:col1 Value:value1} {Name:col2 Value:value2} {Name:col3 Value:value3}]} {Cols:[{Name:col1 Value:value4} {Name:col2 Value:value5} {Name:col3 Value:value6}]}]}
{Name:table2 Rows:[{Cols:[{Name:col1 Value:value7} {Name:col2 Value:value8} {Name:col3 Value:value9} {Name:col4 Value:valueA}]}]}
EOF

你可以在 Go Playground 上尝试运行它。

英文:

The doc of the xml.Unmarshal() function describes how the mapping between XML documents and structs work (supplemented by doc of json.Marshal()). If you're not familiar with struct tags, check out this answer: https://stackoverflow.com/questions/10858787/what-are-the-uses-for-tags-in-go/30889373#30889373

First you need to model the XML document. Note that if you don't have a wrapper XML element around all the others, that is not 1 XML document but multiple, such as your case. You have multiple &lt;table&gt; documents.

You can model them like this:

type Col struct {
	Name  string `xml:&quot;name,attr&quot;`
	Value string `xml:&quot;,chardata&quot;`
}

type Row struct {
	Cols []Col `xml:&quot;col&quot;`
}

type Table struct {
	Name string `xml:&quot;name,attr&quot;`
	Rows []Row  `xml:&quot;row&quot;`
}

And since you have multiple &lt;table&gt; elements, easiest is to create an xml.Decoder() and parse each separately with a Decoder.Decode() call (Decoder.Decode() will attempt to parse 1 XML document from its source reader).

This is how it can be done:

d := xml.NewDecoder(strings.NewReader(data))
for {
	var table Table
	if err := d.Decode(&amp;table); err != nil {
		fmt.Println(err)
		break
	}
	fmt.Printf(&quot;%+v\n&quot;, table)
}

When there are no more data in the source string, Decoder.Decode() will report io.EOF. Running the above cod with the following data string:

const data = `&lt;table name=&quot;table1&quot;&gt;
    &lt;row&gt;
        &lt;col name=&quot;col1&quot;&gt;value1&lt;/col&gt;
        &lt;col name=&quot;col2&quot;&gt;value2&lt;/col&gt;
        &lt;col name=&quot;col3&quot;&gt;value3&lt;/col&gt;
    &lt;/row&gt;
    &lt;row&gt;
        &lt;col name=&quot;col1&quot;&gt;value4&lt;/col&gt;
        &lt;col name=&quot;col2&quot;&gt;value5&lt;/col&gt;
        &lt;col name=&quot;col3&quot;&gt;value6&lt;/col&gt;
    &lt;/row&gt;
&lt;/table&gt;
&lt;table name=&quot;table2&quot;&gt;
    &lt;row&gt;
        &lt;col name=&quot;col1&quot;&gt;value7&lt;/col&gt;
        &lt;col name=&quot;col2&quot;&gt;value8&lt;/col&gt;
        &lt;col name=&quot;col3&quot;&gt;value9&lt;/col&gt;
        &lt;col name=&quot;col4&quot;&gt;valueA&lt;/col&gt;
    &lt;/row&gt;
&lt;/table&gt;`

The output is:

{Name:table1 Rows:[{Cols:[{Name:col1 Value:value1} {Name:col2 Value:value2} {Name:col3 Value:value3}]} {Cols:[{Name:col1 Value:value4} {Name:col2 Value:value5} {Name:col3 Value:value6}]}]}
{Name:table2 Rows:[{Cols:[{Name:col1 Value:value7} {Name:col2 Value:value8} {Name:col3 Value:value9} {Name:col4 Value:valueA}]}]}
EOF

Try it on the Go Playground.

答案3

得分: 1

要根据名称将每个字段解组为不同的类型,您需要获取每个起始元素标记,检查名称,然后根据需要进行解码。类似以下代码(其中getName是返回name属性的函数;请参考playground链接获取一个最小工作示例):

d := xml.NewDecoder()
for {
    tok, err := d.Token()
    if err != nil {
        log.Fatal(err)
    }

    start, ok := tok.(xml.StartElement)
    if !ok {
        // 这是字符数据或其他令牌。
        // 如果它是无效的 XML,将在下一次调用 Token() 时捕获到。
        continue
    }
    switch getName(start) {
    case "user":
        u := &User{}
        err = d.DecodeElement(u, &start)
        if err != nil {
            log.Fatal(err)
        }
        // 在这里对结构进行任何操作。
        fmt.Printf("解码了一个用户:%+v\n", u)
    case "category":
        u := &Category{}
        err = d.DecodeElement(u, &start)
        if err != nil {
            log.Fatal(err)
        }
        fmt.Printf("解码了一个类别:%+v\n", u)
    default:
        // 如果名称无法识别,则跳过元素的剩余部分。
        // 我们也可以返回一个错误。
        d.Skip()
    }
}

链接:https://play.golang.org/p/l7Vmj_8Igp

英文:

To unmarshal each field into a different type based on the name, you'll need to get each start element token, check the name, and then decode as appropriate. Something like the following (where getName is a function that returns the name attribute; see the playground link for a minimal working example):

d := xml.NewDecoder(…)
for {
	tok, err := d.Token()
	if err != nil {
		log.Fatal(err)
	}

	start, ok := tok.(xml.StartElement)
	if !ok {
		// This is chardata or some other token.
		// If it&#39;s bad XML, it will be caught on the next call to Token()
		continue
	}
	switch getName(start) {
	case &quot;user&quot;:
		u := &amp;User{}
		err = d.DecodeElement(u, &amp;start)
		if err != nil {
			log.Fatal(err)
		}
		// Do whatever you want to do with your struct here.
		fmt.Printf(&quot;Decoded a user: %+v\n&quot;, u)
	case &quot;category&quot;:
		u := &amp;Category{}
		err = d.DecodeElement(u, &amp;start)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Printf(&quot;Decoded a category: %+v\n&quot;, u)
	default:
		// If the name is unrecognized, skip the rest of the element.
		// We could also return an error
		d.Skip()
	}
}

https://play.golang.org/p/l7Vmj_8Igp

huangapple
  • 本文由 发表于 2016年10月24日 16:52:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/40214265.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定