在Go URL中有额外的转义字符。

huangapple go评论104阅读模式
英文:

Extra escape character in go URL

问题

我有以下代码片段:

    u := *baseURL
    u.User = nil 
    if q := strings.Index(path, "?"); q > 0 { 
            u.Path = path[:q]
            u.RawQuery = path[q+1:]
    } else {
            u.Path = path
    }   

    log.Printf(" url %v, u.String())

我注意到当baseurl设置为类似于http://localhost:9000/buckets/test%?bucket_uuid=7864b0dcdf0a578bd0012c70aef58aca的内容时,url包似乎在%符号附近添加了一个额外的转义字符。例如,上述打印语句的输出如下:

2015/03/25 12:02:49  url http://localhost:9000/pools/default/buckets/test%2525?bucket_uuid=7864b0dcdf0a578bd0012c70aef58aca

只有在设置URL的RawQuery字段时才会发生这种情况。有任何想法为什么会这样?我正在使用Go版本1.3.3。

谢谢,
Manik

英文:

I have the following snippet of code :

    u := *baseURL
    u.User = nil 
    if q := strings.Index(path, "?"); q > 0 { 
            u.Path = path[:q]
            u.RawQuery = path[q+1:]
    } else {
            u.Path = path
    }   

    log.Printf(" url %v, u.String())

I see that when the baseurl is set to something like this http://localhost:9000/buckets/test%?bucket_uuid=7864b0dcdf0a578bd0012c70aef58aca the url package seems to add an extra escape character near the % sign. For e.g. the output of the above print statement is the following :

2015/03/25 12:02:49  url http://localhost:9000/pools/default/buckets/test%2525?bucket_uuid=7864b0dcdf0a578bd0012c70aef58aca

This seems to only happen when the RawQuery field of the URL is set. Any idea why this is happening ? I'm using go version 1.3.3

Cheers,
Manik

答案1

得分: 5

URL只能包含ASCII字符集中的字符,但通常希望包含/传输ASCII字符集之外的字符。在这种情况下,URL必须转换为有效的ASCII格式。

如果原始URL包含不允许的字符,它们将被转义:用'%'替换,并在其后跟两个十六进制数字。因此,字符'%'是特殊字符,也必须进行转义(其转义形式也以'%'开头,其十六进制代码为25)。

由于您的原始URL包含字符'%',它将被替换为"%25"

回到您的示例:在打印形式中,您看到的是"%2525"。您可能会问为什么不只是"%25"

这是因为您的原始URL包含一个'%'的转义形式,这意味着其原始形式包含转义序列"%25"。如果您将其用作原始输入进行使用/解释,'%'将被"%25"替换,后面将跟随输入中的"25",因此结果为"%2525"

参见:HTML URL编码参考

还有:RFC 1738 - 统一资源定位符(URL)

还有:RFC 3986 - 统一资源标识符(URI):通用语法

英文:

URLs may only contain characters of the ASCII character set, but it is often intended to include/transfer characters outside of this ASCII set. In such cases the URL has to be converted into a valid ASCII format.

If the raw URL contains characters outside of the allowed set, they are escaped: they are replaced with a '%' followed by two hexadecimal digits. Therefore the character '%' is special and also has to be escaped (and its escaped form will start with '%' as well, and its hexadecimal code is 25).

Since your raw URL contains the character '%', it will be replaced by "%25".

Back to your example: in the printed form you see "%2525". You could ask why not just "%25"?

This is because your original url contains a '%' in its escaped form which means its raw form contains the escape sequence "%25". If you use/interpret this as raw input, the '%' will be replaced by "%25" which will be followed by the "25" from the input hence resulting in "%2525".

See: HTML URL Encoding Reference

Also: RFC 1738 - Uniform Resource Locators (URL)

And also: RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

huangapple
  • 本文由 发表于 2015年3月25日 15:37:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/29249900.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定