问题

目前，我正在使用XMLInputFactory和XMLEventReader来解析来自rss数据源的XML。在描述部分中，它包含了使用&gt;和&lt;的html标签。Java将其视为实际标签，并认为这是描述的结尾，因此它截断并转到下一个元素。我如何在解析时排除这些标签？

英文:

Currently, I'm using XMLInputFactory and XMLEventReader to parse XML from a rss data feed. In the description, it contains html tags in the using of &gt; and &lt;. Java reads this as actual tags and it thinks that the end of the description, so it cuts off and goes to the next element. How can I exclude the tags from parsing?

答案1

得分: 0

我不常使用拉取解析器（XMLEventReader），但我相信它与SAX解析器一样，可以将文本节点报告为一系列Characters事件，而不是单个事件，由应用程序负责将它们连接起来。解析器最有可能选择拆分内容的地方是在实体边界处，以避免在扩展实体时执行字符数据的批量复制。

英文:

I don't use the pull parser (XMLEventReader) much, but I believe that, as with the SAX parser, it can report a text node as a sequence of Characters events, rather than as a single event, and it's up to the application to concatenate them. The most likely place the parser is likely to choose to split the content is at entity boundaries, to avoid doing bulk copying of character data when expanding entities.

答案2

得分: -1

你可以临时用你知道的特定唯一标签替换每个 &gt; 和 &lt; 标签，然后进行解析，完成解析后再将它们替换为 &gt; 和 &lt; 标签，就像下面的代码示例一样。

String original = "&lt;container&gt;&amp;gt;This&amp;lt; is a &amp;gt;test&amp;lt;&lt;/container&gt;";
String newStr = original.replace("&amp;gt;", "_TMP_CHARACTER_G_").replace("&amp;lt;", "_TMP_CHARACTER_L_");
System.out.println(original + "\n" + newStr);
// 打印 &lt;container&gt;&amp;gt;This&amp;lt; is a &amp;gt;test&amp;lt;&lt;/container&gt;
// 和 &lt;container&gt;_TMP_CHARACTER_G_This_TMP_CHARACTER_L_ is a _TMP_CHARACTER_G_test_TMP_CHARACTER_L_&lt;/container&gt;

// [在此处进行解析]

String theTagYouWant = newStr;
String theConvertedTag = theTagYouWant.replace("_TMP_CHARACTER_G_", "&amp;gt;").replace("_TMP_CHARACTER_L_", "&amp;lt;");
System.out.println(theConvertedTag);
// 打印原始字符串 &lt;container&gt;&amp;gt;This&amp;lt; is a &amp;gt;test&amp;lt;&lt;/container&gt;

英文:

You could temporary replace every &gt; and &lt; tags by a specific unique label you know. Then, do your parsing, and replace them with the &gt; and &lt; tags again when you are done with your parsing, like in the following code.

String original = &quot;&lt;container&gt;&amp;gt;This&amp;lt; is a &amp;gt;test&amp;lt;&lt;/container&gt;&quot;;
String newStr = original.replace(&quot;&amp;gt;&quot;, &quot;_TMP_CHARACTER_G_&quot;).replace(&quot;&amp;lt;&quot;, &quot;_TMP_CHARACTER_L_&quot;);
System.out.println(original + &quot;\n&quot; + newStr);
// Print &lt;container&gt;&amp;gt;This&amp;lt; is a &amp;gt;test&amp;lt;&lt;/container&gt;
// and &lt;container&gt;_TMP_CHARACTER_G_This_TMP_CHARACTER_L_ is a _TMP_CHARACTER_G_test_TMP_CHARACTER_L_&lt;/container&gt;
        
// [Do your parsing here]
        
String theTagYouWant = newStr;
String theConvertedTag = theTagYouWant.replace(&quot;_TMP_CHARACTER_G_&quot;, &quot;&amp;gt;&quot;).replace(&quot;_TMP_CHARACTER_L_&quot;, &quot;&amp;lt;&quot;);
System.out.println(theConvertedTag);
// Print the original String &lt;container&gt;&amp;gt;This&amp;lt; is a &amp;gt;test&amp;lt;&lt;/container&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Java中解析XML，如果标签的内容包含>或<？

问题

答案1

答案2

使用FreeMarker模板时出现配置错误，导致出现错误。

如何在一个屏幕上同时使用两个相对布局？

使用DateTimeFormatter如何解析”Feb 25″和”February 25″？

日历实现：添加日期时遇到问题

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论