英文:
Golang web app security: should you check if input is valid utf-8?
问题
根据几份最佳实践文档,检查输入数据是否为UTF-8编码是一个好主意。
在我的项目中,我使用Gin框架和go-playground/validator进行验证。有一个"ascii"验证器,但没有"utf-8"验证器。
我在https://pkg.go.dev/unicode/utf8#ValidString找到了一个函数,我想知道是否可以使用它来检查输入数据,或者因为Go本身在内部使用Unicode,所以这个检查是多余的?
以下是一个示例代码:
package main
import (
"net/http"
"github.com/gin-gonic/gin"
)
type User struct {
Name string `json:"name" binding:"required,alphanum"`
}
func main() {
r := gin.Default()
r.POST("/user", createUserHandler)
r.Run()
}
func createUserHandler(c *gin.Context) {
var newUser User
err := c.ShouldBindJSON(&newUser)
if err != nil {
c.AbortWithError(http.StatusBadRequest, err)
return
}
c.Status(http.StatusCreated)
}
在调用c.ShouldBindJSON
之后,可以确保newUser
中的name
是UTF-8编码的吗?检查name
是否符合utf8.ValidString
有什么优势吗?
英文:
According to several best practice documents, it is a good idea to check whether the input data is UTF-8 or not.
In my project, I use Gin and thus go-playground/validator for validation. There is an "ascii" validator, but no "utf-8" validator.
I found https://pkg.go.dev/unicode/utf8#ValidString, and I wondered if it would be of any assistance to check the inputs with that or is that given, since Go itself uses Unicode internally?
Here is an example:
package main
import (
"net/http"
"github.com/gin-gonic/gin"
)
type User struct {
Name string `json:"name" binding:"required,alphanum"`
}
func main() {
r := gin.Default()
r.POST("/user", createUserHandler)
r.Run()
}
func createUserHandler(c *gin.Context) {
var newUser User
err := c.ShouldBindJSON(&newUser)
if err != nil {
c.AbortWithError(http.StatusBadRequest, err)
return
}
c.Status(http.StatusCreated)
}
Is it ensured that after Calling c.ShouldBindJson that name in newUser
is UTF-8 encoded? Is there any advantage in checking name
with utf8.ValidString?
答案1
得分: 4
Gin使用标准的encoding/json包来解析JSON文档。该包的文档中提到:
在解析带引号的字符串时,无效的UTF-8或无效的UTF-16代理对不会被视为错误。相反,它们会被Unicode替换字符U+FFFD替换。
确保解码后的字符串值是有效的UTF-8。使用utf8.ValidString检查字符串值没有任何优势。
根据应用程序的要求,您可能希望检查并处理Unicode替换字符"�"。顺便说一句:正如在本回答中的�所示,SO将Unicode替换字符处理为任何其他字符一样。
Go本身内部使用Unicode吗?
某些语言特性使用UTF-8编码(字符串上的范围、[]rune和字符串之间的转换),但这些特性不限制可以存储在字符串中的字节。字符串可以包含任何字节序列,包括无效的UTF-8。
英文:
Gin uses the standard encoding/json package to unmarshal JSON documents. The documentation for that package says:
> When unmarshaling quoted strings, invalid UTF-8 or invalid UTF-16 surrogate pairs are not treated as an error. Instead, they are replaced by the Unicode replacement character U+FFFD.
It is ensured that the decoded string values are valid UTF-8. There is no advantage to checking string values with utf8.ValidString.
Depending on the application requirements, you may want to check for and handle the Unicode replacement character, "�". Aside: As demonstrated by the � in this answer, SO handles the Unicode replacement character like any other character.
> Go itself uses Unicode internally?
Some language features use UTF-8 encoding (range on string, conversions between []rune and string), but those features do not restrict the bytes that can be stored in a string. Strings can contain any sequence of bytes including invalid UTF-8.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论