使用Golang进行HTML验证

huangapple go评论91阅读模式
英文:

HTML Validation with Golang

问题

在我的API中,我有一个POST端点。其中一个预期的参数被发送到该端点的是一段(宽松)有效的HTML代码。

POST请求将以JSON格式发送。

在golang中,我如何确保被发送的HTML代码是有效的?我已经寻找了几天,但仍然没有找到任何解决方法。

"有效"这个术语有点宽松。我试图确保标签是开闭匹配的,引号放置正确等。

英文:

Within my API I have a POST end point. One of the expected parameters being posted to that end point is a block of (loosely) valid HTML.

The POST will be in the format of JSON.

Within golang how can I ensure that the HTML which is posted is valid? I have been looking for something for a few days now and still haven't managed to find anything?

The term "valid" is kind of loose. I trying to ensure that tags are opened and closed, speech marks are in the right places etc.

答案1

得分: 7

有点晚了,但是这里有几个在Go语言中可以用来验证HTML结构的解析器,如果你只关心验证HTML的结构(例如,你不关心div是否在span内,这是不允许的,但是是一个模式级别的问题):

x/net/html

golang.org/x/net/html 包含一个非常宽松的解析器。几乎任何东西都会被解析为有效的HTML,类似于许多网络浏览器尝试做的事情(例如,它会忽略许多情况下未转义值的问题)。
例如,类似于 <span>></span> 的内容可能会被验证为包含 '>' 字符的span。

可以像这样使用它:

r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
    tt := z.Next()
    if tt == html.ErrorToken {
        err := z.Err()
        if err == io.EOF {
            // 没有错误,验证通过!
            return nil
        }
	    return err
    }
}

encoding/xml

如果你需要稍微严格一点的解析器,但对于HTML仍然可以接受,你可以配置一个 xml.Decoder 来处理HTML(这是我所做的,它让我在任何给定的情况下更加灵活地决定要多严格):

r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)

// 配置解析器以处理HTML;对于XHTML,可以省略strict和autoclose
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
    tt, err := d.Token()
    switch err {
    case io.EOF:
        return nil // 完成,验证通过!
    case nil:
    default:
        return err // 哎呀,出了点问题
    }
}
英文:

A bit late to the game, but here are a couple of parsers in Go that will work if you just want to validate the structure of the HTML (eg. you don't care if a div is inside a span, which is not allowed but is a schema level problem):

x/net/html

The golang.org/x/net/html package contains a very loose parser. Almost anything will result in valid HTML, similar to what a lot of web browsers try to do (eg. it will ignore problems with unescaped values in many cases).
For example, something like <span>></span> will likely validate (I didn't check this particular one, I just made it up) as a span with the '>' character in it.

It can be used something like this:

r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
    tt := z.Next()
    if tt == html.ErrorToken {
        err := z.Err()
        if err == io.EOF {
            // Not an error, we're done and it's valid!
            return nil
        }
	    return err
    }
}

encoding/xml

If you need something a tiny bit more strict, but which is still okay for HTML you can configure an xml.Decoder to work with HTML (this is what I do, it lets me be a bit more flexible about how strict I want to be in any given situation):

r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)

// Configure the decoder for HTML; leave off strict and autoclose for XHTML
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
    tt, err := d.Token()
    switch err {
    case io.EOF:
        return nil // We're done, it's valid!
    case nil:
    default:
        return err // Oops, something wasn't right
    }
}

答案2

得分: 2

你可以使用html.Parse来检查提供的HTML代码块是否正确解析,该函数位于这个包中。仅需进行验证,你只需检查是否存在错误即可。

英文:

You check that the HTML blob provided parses correctly using html.Parse from this package. For validation only, all you have to do is check for errors.

答案3

得分: 0

使用 golang.org/x/net/html

import (
	"strings"
	"golang.org/x/net/html"
)

func isValidHTML(htmlStr string) bool {
	_, err := html.Parse(strings.NewReader(htmlStr))
	return err == nil
}

这段代码导入了 stringsgolang.org/x/net/html 包,并定义了一个名为 isValidHTML 的函数,用于检查给定的 HTML 字符串是否有效。函数内部使用 html.Parse 函数解析 HTML 字符串,并根据解析结果判断是否存在错误。如果解析过程中没有出现错误,则返回 true,否则返回 false

英文:

use golang.org/x/net/html

import (
	"strings"
	"golang.org/x/net/html"
)

func isValidHTML(htmlStr string) bool {
	_, err := html.Parse(strings.NewReader(htmlStr))
	return err == nil
}

huangapple
  • 本文由 发表于 2015年8月3日 21:05:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/31788134.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定