2017年8月13日 00:50:33go评论116阅读模式

英文:

How to get '<' and '>' in XML string?

问题

这是一个XML字符串，你想要获取其中的"<"和">"的值吗？你在解析XML时遇到了问题，并且不能更改这些字符串。有人可以帮助你吗？以下是你的代码：

package main

import (
	"encoding/xml"
	"fmt"
)

func main() {
	type Example struct {
		XMLName  xml.Name `xml:"Shop"`
		ShopName string   `xml:"ShopName"`
	}

	myString1 := `&lt;Shop&gt; 
		&lt;ShopName&gt;Fresh Fruit &lt;Fruit Shop&gt;&lt;/ShopName&gt;
	&lt;/Shop&gt;`

	myString2 := `&lt;Shop&gt; 
		&lt;ShopName&gt;Fresh Fruit &lt; Fruit Shop &gt;&lt;/ShopName&gt;
	&lt;/Shop&gt;`

	//example 1
	var example1 Example
	err := xml.Unmarshal([]byte(myString1), &example1)
	if err != nil {
		fmt.Println("error: %example1", err)
	} else {
		fmt.Println(example1.ShopName)
	}

	//example 2
	var example2 Example
	err = xml.Unmarshal([]byte(myString2), &example2)
	if err != nil {
		fmt.Printf("error: %example2", err)
		return
	} else {
		fmt.Println(example2.ShopName)
	}
}

你遇到的错误如下：

error: %example1 XML syntax error on line 2: attribute name without = in element
error: &amp;{%!e(string=expected element name after &lt;) %!e(int=2)}xample2

你想要得到的结果是：

Fresh Fruit &lt;Fruit Shop&gt;
Fresh Fruit &lt; Fruit Shop &gt;

英文:

Is it posible to get '<' and '>' value in this XML string? I have problem with unmarshal, and I can't change the strings. Is there anyone who can help me in this? Here my code:

package main

import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
)

func main() {
	type Example struct {
		XMLName xml.Name `xml:&quot;Shop&quot;`
		ShopName  string `xml:&quot;ShopName&quot;`
	}
	
	myString1 := `&lt;Shop&gt; 
		&lt;ShopName&gt;Fresh Fruit &lt;Fruit Shop&gt;&lt;/ShopName&gt;
	&lt;/Shop&gt;`
	
	myString2 :=`&lt;Shop&gt; 
		&lt;ShopName&gt;Fresh Fruit &lt; Fruit Shop &gt;&lt;/ShopName&gt;
	&lt;/Shop&gt;`
	
	//example 1
	var example1 Example
	err := xml.Unmarshal([]byte(myString1), &amp;example1)
	if err != nil {
		fmt.Println(&quot;error: %example1&quot;, err)
	}else{
		fmt.Println(example1.ShopName)
	}		
	
	//example 2
	var example2 Example
	err = xml.Unmarshal([]byte(myString2), &amp;example2)
	if err != nil {
		fmt.Printf(&quot;error: %example2&quot;, err)
		return
	}else{
		fmt.Println(example2.ShopName)
	}
}

I get an error bellow:

error: %example1 XML syntax error on line 2: attribute name without = in element
error: &amp;{%!e(string=expected element name after &lt;) %!e(int=2)}xample2

What I want to get:

Fresh Fruit &lt;Fruit Shop&gt;
Fresh Fruit &lt; Fruit Shop &gt;

答案1

得分: 1

你提供的输入明显是无效的XML。XML的创建过程中存在一个错误。

思路

既然你说必须按照现有的方式处理它...这里有一个建议：

使用正则表达式将所有的闭合标签替换为在输入中基本上不会出现的内容（例如@#lt#@/tagname@#gt#@）。在此过程中，将所有不同的标签名称保存到一个切片中。
使用标签名称切片替换开始标签。
现在转义所有剩余的<和>。
最后，将原始标签替换回来：将@#lt#@替换为<，将@#gt#@替换为>。

现在你应该有一个可解析的有效XML。

概念验证

Playground

package main

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;regexp&quot;
	&quot;sort&quot;
)

var (
	rlt = []byte(&quot;@#lt#@&quot;)
	rgt = []byte(&quot;@#gt#@&quot;)
	lt  = []byte(&quot;&amp;lt;&quot;)
	gt  = []byte(&quot;&amp;gt;&quot;)
)

// 用于按长度排序字符串
type ByLength []string

func (s ByLength) Len() int {
	return len(s)
}
func (s ByLength) Swap(i, j int) {
	s[i], s[j] = s[j], s[i]
}
func (s ByLength) Less(i, j int) bool {
	return len(s[i]) &lt; len(s[j])
}

func main() {
	s := `&lt;Shop&gt;
	&lt;ShopName&gt;Fresh Fruit &lt;Fruit Shop&gt;&lt;/ShopName&gt;
	&lt;ShopName attr=&quot;val1&quot;&gt;Fresh Fruit &lt;Shop test&gt;&lt;/ShopName&gt;
&lt;/Shop&gt;`

	r1, err := regexp.Compile(&quot;&lt;/([^&lt;&gt;]*)&gt;&quot;)
	if err != nil {
		log.Fatal(err)
	}

	names := []string{}
	out := r1.ReplaceAllFunc([]byte(s), func(b []byte) []byte {
		name := b[2 : len(b)-1]

		// TODO: 仅在列表中不存在时才添加名称
		names = append(names, string(name))

		// 可能可以优化
		bytes := make([]byte, 0, len(name)+12)
		bytes = append(bytes, rlt...)
		bytes = append(bytes, name...)
		bytes = append(bytes, rgt...)
		return bytes
	})

	// 按长度降序排序名称，否则我们可能会替换名称的一部分，比如 &lt;Shop 和 &lt;ShopName
	sort.Sort(sort.Reverse(ByLength(names)))

	for _, name := range names {
		// 仅替换完全匹配的开始标签
		out = bytes.Replace(out, []byte(fmt.Sprintf(&quot;&lt;%s&gt;&quot;, name)), []byte(fmt.Sprintf(&quot;@#lt#@%s@#gt#@&quot;, name)), -1)

		// 替换带有属性的开始标签
		r3, err := regexp.Compile(fmt.Sprintf(&quot;&lt;%s( [^&lt;&gt;=]+=\&quot;[^&lt;&gt;]+)&gt;&quot;, name))
		if err != nil {
			// 处理错误
		}
		out = r3.ReplaceAll(out, []byte(fmt.Sprintf(&quot;@#lt#@%s$1@#gt#@&quot;, name)))
	}

	out = bytes.Replace(out, []byte{&#39;&lt;&#39;}, lt, -1)
	out = bytes.Replace(out, []byte{&#39;&gt;&#39;}, gt, -1)

	out = bytes.Replace(out, rlt, []byte{&#39;&lt;&#39;}, -1)
	out = bytes.Replace(out, rgt, []byte{&#39;&gt;&#39;}, -1)

	fmt.Println(string(out))
}

注意事项

这只是一个概念验证。它没有针对性能进行优化。
你可能仍然会遇到无法正确转义的内容。那么你需要进一步优化。如果内容中存在以下内容，它将被错误地视为标签：<tagname>或<tagname something ="something>。因此，预计仍然会有一些无效的XML。记录无效的XML，以便改进算法。

英文:

The input you have is definitely invalid XML. There is a bug in the creation routine of the XML.

Idea

Since you say you have to deal with it the way it is... here a suggestion:

replace all closing tags via regex to something you will basically never have in your input (e.g. @#lt#@/tagname@#gt#@). While doing that save all the distinct tag names to a slice.
With the slice of tag names replace the start tags
Now escape all remaining < and >
Last but not least replace the original tags back: @#lt#@ to < and @#gt#@ to >

Now you should have valid xml that is parseable.

Proof of Concept

Playground

package main

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;regexp&quot;
	&quot;sort&quot;
)

var (
	rlt = []byte(&quot;@#lt#@&quot;)
	rgt = []byte(&quot;@#gt#@&quot;)
	lt  = []byte(&quot;&amp;lt;&quot;)
	gt  = []byte(&quot;&amp;gt;&quot;)
)

// used for sorting strings by length
type ByLength []string

func (s ByLength) Len() int {
	return len(s)
}
func (s ByLength) Swap(i, j int) {
	s[i], s[j] = s[j], s[i]
}
func (s ByLength) Less(i, j int) bool {
	return len(s[i]) &lt; len(s[j])
}

func main() {
	s := `&lt;Shop&gt;
	&lt;ShopName&gt;Fresh Fruit &lt;Fruit Shop&gt;&lt;/ShopName&gt;
	&lt;ShopName attr=&quot;val1&quot;&gt;Fresh Fruit &lt;Shop test&gt;&lt;/ShopName&gt;
&lt;/Shop&gt;`

	r1, err := regexp.Compile(&quot;&lt;/([^&lt;&gt;]*)&gt;&quot;)
	if err != nil {
		log.Fatal(err)
	}

	names := []string{}
	out := r1.ReplaceAllFunc([]byte(s), func(b []byte) []byte {
		name := b[2 : len(b)-1]

		// TODO: only append name if not already in list
		names = append(names, string(name))

		// probably optimizable
		bytes := make([]byte, 0, len(name)+12)
		bytes = append(bytes, rlt...)
		bytes = append(bytes, name...)
		bytes = append(bytes, rgt...)
		return bytes
	})

	// sort names descending by length otherwise we risk replacing parts of names like with &lt;Shop and &lt;ShopName
	sort.Sort(sort.Reverse(ByLength(names)))

	for _, name := range names {
		// replace only exact start tags
		out = bytes.Replace(out, []byte(fmt.Sprintf(&quot;&lt;%s&gt;&quot;, name)), []byte(fmt.Sprintf(&quot;@#lt#@%s@#gt#@&quot;, name)), -1)

		// replace start tags with attributes
		r3, err := regexp.Compile(fmt.Sprintf(&quot;&lt;%s( [^&lt;&gt;=]+=\&quot;[^&lt;&gt;]+)&gt;&quot;, name))
		if err != nil {
			// handle error
		}
		out = r3.ReplaceAll(out, []byte(fmt.Sprintf(&quot;@#lt#@%s$1@#gt#@&quot;, name)))
	}

	out = bytes.Replace(out, []byte{&#39;&lt;&#39;}, lt, -1)
	out = bytes.Replace(out, []byte{&#39;&gt;&#39;}, gt, -1)

	out = bytes.Replace(out, rlt, []byte{&#39;&lt;&#39;}, -1)
	out = bytes.Replace(out, rgt, []byte{&#39;&gt;&#39;}, -1)

	fmt.Println(string(out))
}

Notes

this is a proof of concept. This is not optimised for performance.
you might still run into content that might not be escaped properly. Then you will need to further optimise. If there is something like this in the content it will be falsely considered a tag: <tagname> or <tagname something ="something>. Therefore expect some xml to still to be invalid. Log invalid xml so you can improve the algorithm.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在XML字符串中获取’<'和'>‘符号？

问题

答案1

思路

概念验证

注意事项

Idea

Proof of Concept

Notes

从S3复制对象时出现错误，错误信息为”NoSuchKey: The specified key does not exist”。

在Golang变量中，两个时间段的乘积为零。

How concurrency works with anonymous functions ? go

Casting interface可以反映数组类型。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论