英文:
Parsing malformed xml file in Go
问题
我有大量的XML文件需要解析,其中包含在闭合标签中的未闭合标签。类似下面的内容:
<submission>
<first-name>Henry
<last-name>Donald
<id>4224
</submission>
我将decoder.Strict设置为false,但仍然无法正确解析整个XML文件。
type Submission struct {
FirstName string `xml:"first-name"`
LastName string `xml:"last-name"`
ID string `xml:"id"`
}
func main() {
dec := xml.NewDecoder(bytes.NewReader([]byte(sub)))
dec.Strict = false
dec.AutoClose = xml.HTMLAutoClose
dec.Entity = xml.HTMLEntity
var s Submission
err := dec.Decode(&s)
if err != nil {
fmt.Println(err)
}
fmt.Println(s)
}
Playground: https://play.golang.org/p/-_chEpDhzX
我知道有一个HTML标记解析器可以尝试使用,但我更愿意使用XML包,因为大多数文件都是格式正确的。
英文:
I have a large number of xml files to parse that contain unclosed tags wrapped in closed tags. Something like below:
<submission>
<first-name>Henry
<last-name>Donald
<id>4224
</submission>
I set decoder.Strict = false but it is still unable to parse the entire xml file properly.
type Submission struct {
FirstName string `xml:"first-name"`
LastName string `xml:"last-name"`
ID string `xml:"id"`
}
func main() {
dec := xml.NewDecoder(bytes.NewReader([]byte(sub)))
dec.Strict = false
dec.AutoClose = xml.HTMLAutoClose
dec.Entity = xml.HTMLEntity
var s Submission
err := dec.Decode(&s)
if err != nil {
fmt.Println(err)
}
fmt.Println(s)
}
Playground: https://play.golang.org/p/-_chEpDhzX
I know there is a html tokenizer that I may try using but I would prefer to use the XML package as the majority of the files are properly formatted.
答案1
得分: 2
以下对我有用,这可能只适用于您知道有问题的标签的情况。不过,奇怪的是,如果我还添加了 first-name,它就不起作用。
dec.AutoClose = append(dec.AutoClose, "last-name")
dec.AutoClose = append(dec.AutoClose, "id")
英文:
Below worked for me, which is probably only ideal if you know the problematic tags. Although, strangely it doesn't work if I also add first-name.
dec.AutoClose = append(dec.AutoClose, "last-name")
dec.AutoClose = append(dec.AutoClose, "id")
答案2
得分: -1
没有其他办法。你需要自己的解码器:http://play.golang.org/p/Kr7nq64f-c
英文:
No ways around it. You need your own decoder: http://play.golang.org/p/Kr7nq64f-c
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论