如何转换HTML标签中的转义字符?

huangapple go评论92阅读模式
英文:

How to convert escape characters in HTML tags?

问题

我们可以使用strconv.Unquote函数来直接将"\u003chtml\u003e"转换为"<html>"。这个函数可以将带有转义字符的字符串转换为对应的字符。在Go语言中,可以这样使用:

package main

import (
	"fmt"
	"strconv"
)

func main() {
	str := "\u003chtml\u003e"
	unquotedStr, err := strconv.Unquote(`"` + str + `"`)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println(unquotedStr)
}

这段代码会输出<html>。这里我们使用了strconv.Unquote函数来将带有转义字符的字符串转换为对应的字符。注意,我们在调用strconv.Unquote之前,需要在字符串的前后添加双引号。

英文:

How can we directly convert &quot;\u003chtml\u003e&quot; to &quot;&lt;html&gt;&quot;? Conversion of &quot;&lt;html&gt;&quot; to &quot;\u003chtml\u003e&quot; is quite easy using json.Marshal(), but json.Unmarshal() is quite lengthy and cumbersome. Is there any direct way to do that in golang?

答案1

得分: 3

你可以使用strconv.Unquote()函数进行转换。

需要注意的是,strconv.Unquote()只能解析带引号的字符串(例如以引号字符&quot;或反引号字符`开头和结尾的字符串),所以我们需要手动添加引号。

示例:

// 重要:使用反引号 `(原始字符串字面量)
// 否则编译器会解析它(解释字符串字面量)!

s := `\u003chtml\u003e`
fmt.Println(s)
s2, err := strconv.Unquote(`&quot;` + s + `&quot;`)
if err != nil {
    panic(err)
}
fmt.Println(s2)

输出结果(在Go Playground上尝试):

\u003chtml\u003e
&lt;html&gt;

**注意:**要进行HTML文本的转义和反转义,可以使用html包。引用其文档:

> 包html提供了转义和反转义HTML文本的函数。

但是html包(特别是html.UnescapeString())不会解码形式为\uxxxx的Unicode序列,只会解码&amp;#decimal;&amp;#xHH;

示例:

fmt.Println(html.UnescapeString(`\u003chtml\u003e`)) // 错误
fmt.Println(html.UnescapeString(`&amp;#60;html&amp;#62;`))   // 正确
fmt.Println(html.UnescapeString(`&amp;#x3c;html&amp;#x3e;`)) // 正确

输出结果(在Go Playground上尝试):

\u003chtml\u003e
&lt;html&gt;
&lt;html&gt;

注意2:

还需要注意,如果你编写如下代码:

s := &quot;\u003chtml\u003e&quot;

这个带引号的字符串将被编译器本身解析为一个_解释字符串字面量_,所以你无法真正测试它。要在源代码中指定带引号的字符串,可以使用反引号来指定一个_原始字符串字面量_,或者可以使用_双引号_的解释字符串字面量:

s := &quot;\u003chtml\u003e&quot; // 解释字符串字面量(由编译器解析!)
fmt.Println(s)

s2 := `\u003chtml\u003e` // 原始字符串字面量(不会解析)
fmt.Println(s2)

s3 := &quot;\\u003chtml\\u003e&quot; // 双引号的解释字符串字面量
                           // (由编译器解析为“单引号”)
fmt.Println(s3)

输出结果:

&lt;html&gt;
\u003chtml\u003e
英文:

You can use the strconv.Unquote() to do the conversion.

One thing you should be aware of is that strconv.Unquote() can only unquote strings that are in quotes (e.g. start and end with a quote char &quot; or a back quote char `), so we have to manually append that.

Example:

// Important to use backtick ` (raw string literal)
// else the compiler will unquote it (interpreted string literal)!

s := `\u003chtml\u003e`
fmt.Println(s)
s2, err := strconv.Unquote(`&quot;` + s + `&quot;`)
if err != nil {
	panic(err)
}
fmt.Println(s2)

Output (try it on the Go Playground):

\u003chtml\u003e
&lt;html&gt;

Note: To do HTML text escaping and unescaping, you can use the html package. Quoting its doc:

> Package html provides functions for escaping and unescaping HTML text.

But the html package (specifically html.UnescapeString()) does not decode unicode sequences of the form \uxxxx, only &amp;#decimal; or &amp;#xHH;.

Example:

fmt.Println(html.UnescapeString(`\u003chtml\u003e`)) // wrong
fmt.Println(html.UnescapeString(`&amp;#60;html&amp;#62;`))   // good
fmt.Println(html.UnescapeString(`&amp;#x3c;html&amp;#x3e;`)) // good

Output (try it on the Go Playground):

\u003chtml\u003e
&lt;html&gt;
&lt;html&gt;

Note #2:

You should also note that if you write a code like this:

s := &quot;\u003chtml\u003e&quot;

This quoted string will be unquoted by the compiler itself as it is an interpreted string literal, so you can't really test that. To specify quoted string in the source, you may use the backtick to specify a raw string literal or you may use a double quoted interpreted string literal:

s := &quot;\u003chtml\u003e&quot; // Interpreted string literal (unquoted by the compiler!)
fmt.Println(s)

s2 := `\u003chtml\u003e` // Raw string literal (no unquoting will take place)
fmt.Println(s2)

s3 := &quot;\\u003chtml\\u003e&quot; // Double quoted interpreted string literal
                           // (unquoted by the compiler to be &quot;single&quot; quoted)
fmt.Println(s3)

Output:

&lt;html&gt;
\u003chtml\u003e

答案2

得分: 1

你可以使用fmt字符串格式化包来实现这个功能。

fmt.Printf("%v","<html>") // 将输出&lt;html&gt;

https://play.golang.org/p/ZEot6bxO1H

英文:

You can use the fmt string formatting package for this scope.

fmt.Printf(&quot;%v&quot;,&quot;\u003chtml\u003e&quot;) // will output &lt;html&gt;

https://play.golang.org/p/ZEot6bxO1H

答案3

得分: 1

我认为这是一个常见的问题。这是我解决它的方法。

func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
    str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
    if err != nil {
        return nil, err
    }
    return []byte(str), nil
}

func main() {
    // Both are valid JSON.
    var jsonRawEscaped json.RawMessage   // json raw with escaped unicode chars
    var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars

    // '\u263a' == '☺'
    jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
    jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped)                        // "☺"

    fmt.Println(string(jsonRawEscaped))   // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
    fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}

希望对某人有所帮助。

英文:

I think it's a common problem. This is how I get it work.

func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
	str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
	if err != nil {
		return nil, err
	}
	return []byte(str), nil
}

func main() {
	// Both are valid JSON.
	var jsonRawEscaped json.RawMessage   // json raw with escaped unicode chars
	var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars

	// &#39;\u263a&#39; == &#39;☺&#39;
	jsonRawEscaped = []byte(`{&quot;HelloWorld&quot;: &quot;\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a&quot;}`) // &quot;\\u263a&quot;
	jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped)                        // &quot;☺&quot;

	fmt.Println(string(jsonRawEscaped))   // {&quot;HelloWorld&quot;: &quot;\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a&quot;}
	fmt.Println(string(jsonRawUnescaped)) // {&quot;HelloWorld&quot;: &quot;안녕, 세상(世上). ☺&quot;}
}

https://play.golang.org/p/pUsrzrrcDG-

Hope this helps someone.

huangapple
  • 本文由 发表于 2016年4月10日 18:26:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/36528575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定