Golang Web应用程序安全:您是否应该检查输入是否为有效的UTF-8编码?

huangapple go评论76阅读模式
英文:

Golang web app security: should you check if input is valid utf-8?

问题

根据几份最佳实践文档,检查输入数据是否为UTF-8编码是一个好主意。

在我的项目中,我使用Gin框架和go-playground/validator进行验证。有一个"ascii"验证器,但没有"utf-8"验证器。

我在https://pkg.go.dev/unicode/utf8#ValidString找到了一个函数,我想知道是否可以使用它来检查输入数据,或者因为Go本身在内部使用Unicode,所以这个检查是多余的?

以下是一个示例代码:

package main

import (
	"net/http"

	"github.com/gin-gonic/gin"
)

type User struct {
	Name string `json:"name" binding:"required,alphanum"`
}

func main() {
	r := gin.Default()
	r.POST("/user", createUserHandler)
	r.Run()
}

func createUserHandler(c *gin.Context) {
	var newUser User
	err := c.ShouldBindJSON(&newUser)

	if err != nil {
		c.AbortWithError(http.StatusBadRequest, err)
		return
	}

	c.Status(http.StatusCreated)
}

在调用c.ShouldBindJSON之后,可以确保newUser中的name是UTF-8编码的吗?检查name是否符合utf8.ValidString有什么优势吗?

英文:

According to several best practice documents, it is a good idea to check whether the input data is UTF-8 or not.

In my project, I use Gin and thus go-playground/validator for validation. There is an "ascii" validator, but no "utf-8" validator.

I found https://pkg.go.dev/unicode/utf8#ValidString, and I wondered if it would be of any assistance to check the inputs with that or is that given, since Go itself uses Unicode internally?

Here is an example:

package main

import (
	"net/http"

	"github.com/gin-gonic/gin"
)

type User struct {
	Name string `json:"name" binding:"required,alphanum"`
}

func main() {
	r := gin.Default()
	r.POST("/user", createUserHandler)
	r.Run()
}

func createUserHandler(c *gin.Context) {
	var newUser User
	err := c.ShouldBindJSON(&newUser)

	if err != nil {
		c.AbortWithError(http.StatusBadRequest, err)
		return
	}

	c.Status(http.StatusCreated)
}

Is it ensured that after Calling c.ShouldBindJson that name in newUser is UTF-8 encoded? Is there any advantage in checking name with utf8.ValidString?

答案1

得分: 4

Gin使用标准的encoding/json包来解析JSON文档。该包的文档中提到:

在解析带引号的字符串时,无效的UTF-8或无效的UTF-16代理对不会被视为错误。相反,它们会被Unicode替换字符U+FFFD替换。

确保解码后的字符串值是有效的UTF-8。使用utf8.ValidString检查字符串值没有任何优势。

根据应用程序的要求,您可能希望检查并处理Unicode替换字符"�"。顺便说一句:正如在本回答中的�所示,SO将Unicode替换字符处理为任何其他字符一样。

Go本身内部使用Unicode吗?

某些语言特性使用UTF-8编码(字符串上的范围、[]rune和字符串之间的转换),但这些特性不限制可以存储在字符串中的字节。字符串可以包含任何字节序列,包括无效的UTF-8。

英文:

Gin uses the standard encoding/json package to unmarshal JSON documents. The documentation for that package says:

> When unmarshaling quoted strings, invalid UTF-8 or invalid UTF-16 surrogate pairs are not treated as an error. Instead, they are replaced by the Unicode replacement character U+FFFD.

It is ensured that the decoded string values are valid UTF-8. There is no advantage to checking string values with utf8.ValidString.

Depending on the application requirements, you may want to check for and handle the Unicode replacement character, "�". Aside: As demonstrated by the � in this answer, SO handles the Unicode replacement character like any other character.

> Go itself uses Unicode internally?

Some language features use UTF-8 encoding (range on string, conversions between []rune and string), but those features do not restrict the bytes that can be stored in a string. Strings can contain any sequence of bytes including invalid UTF-8.

huangapple
  • 本文由 发表于 2023年4月23日 02:00:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76081160.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定