英文:
HTML Validation with Golang
问题
在我的API中,我有一个POST端点。其中一个预期的参数被发送到该端点的是一段(宽松)有效的HTML代码。
POST请求将以JSON格式发送。
在golang中,我如何确保被发送的HTML代码是有效的?我已经寻找了几天,但仍然没有找到任何解决方法。
"有效"这个术语有点宽松。我试图确保标签是开闭匹配的,引号放置正确等。
英文:
Within my API I have a POST end point. One of the expected parameters being posted to that end point is a block of (loosely) valid HTML.
The POST will be in the format of JSON.
Within golang how can I ensure that the HTML which is posted is valid? I have been looking for something for a few days now and still haven't managed to find anything?
The term "valid" is kind of loose. I trying to ensure that tags are opened and closed, speech marks are in the right places etc.
答案1
得分: 7
有点晚了,但是这里有几个在Go语言中可以用来验证HTML结构的解析器,如果你只关心验证HTML的结构(例如,你不关心div是否在span内,这是不允许的,但是是一个模式级别的问题):
x/net/html
golang.org/x/net/html
包含一个非常宽松的解析器。几乎任何东西都会被解析为有效的HTML,类似于许多网络浏览器尝试做的事情(例如,它会忽略许多情况下未转义值的问题)。
例如,类似于 <span>></span>
的内容可能会被验证为包含 '>' 字符的span。
可以像这样使用它:
r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
tt := z.Next()
if tt == html.ErrorToken {
err := z.Err()
if err == io.EOF {
// 没有错误,验证通过!
return nil
}
return err
}
}
encoding/xml
如果你需要稍微严格一点的解析器,但对于HTML仍然可以接受,你可以配置一个 xml.Decoder
来处理HTML(这是我所做的,它让我在任何给定的情况下更加灵活地决定要多严格):
r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)
// 配置解析器以处理HTML;对于XHTML,可以省略strict和autoclose
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
tt, err := d.Token()
switch err {
case io.EOF:
return nil // 完成,验证通过!
case nil:
default:
return err // 哎呀,出了点问题
}
}
英文:
A bit late to the game, but here are a couple of parsers in Go that will work if you just want to validate the structure of the HTML (eg. you don't care if a div is inside a span, which is not allowed but is a schema level problem):
x/net/html
The golang.org/x/net/html
package contains a very loose parser. Almost anything will result in valid HTML, similar to what a lot of web browsers try to do (eg. it will ignore problems with unescaped values in many cases).
For example, something like <span>></span>
will likely validate (I didn't check this particular one, I just made it up) as a span with the '>' character in it.
It can be used something like this:
r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
tt := z.Next()
if tt == html.ErrorToken {
err := z.Err()
if err == io.EOF {
// Not an error, we're done and it's valid!
return nil
}
return err
}
}
encoding/xml
If you need something a tiny bit more strict, but which is still okay for HTML you can configure an xml.Decoder
to work with HTML (this is what I do, it lets me be a bit more flexible about how strict I want to be in any given situation):
r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)
// Configure the decoder for HTML; leave off strict and autoclose for XHTML
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
tt, err := d.Token()
switch err {
case io.EOF:
return nil // We're done, it's valid!
case nil:
default:
return err // Oops, something wasn't right
}
}
答案2
得分: 2
你可以使用html.Parse
来检查提供的HTML代码块是否正确解析,该函数位于这个包中。仅需进行验证,你只需检查是否存在错误即可。
英文:
You check that the HTML blob provided parses correctly using html.Parse
from this package. For validation only, all you have to do is check for errors.
答案3
得分: 0
使用 golang.org/x/net/html
包
import (
"strings"
"golang.org/x/net/html"
)
func isValidHTML(htmlStr string) bool {
_, err := html.Parse(strings.NewReader(htmlStr))
return err == nil
}
这段代码导入了 strings
和 golang.org/x/net/html
包,并定义了一个名为 isValidHTML
的函数,用于检查给定的 HTML 字符串是否有效。函数内部使用 html.Parse
函数解析 HTML 字符串,并根据解析结果判断是否存在错误。如果解析过程中没有出现错误,则返回 true
,否则返回 false
。
英文:
use golang.org/x/net/html
import (
"strings"
"golang.org/x/net/html"
)
func isValidHTML(htmlStr string) bool {
_, err := html.Parse(strings.NewReader(htmlStr))
return err == nil
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论