2020年4月6日 20:24:43go评论83阅读模式

英文:

Java InputStream reading the chracter \" instead of showing as " for html content file

问题

我想读取一个以字符串格式存储的HTML内容文件

文件内容如下

&lt;table class=\&quot;relative-table\&quot; style

但是当我在Java中进行检查时，显示如下

&lt;table class=&quot;\&amp;quot;relative-table\&amp;quot;&quot; style=

我的期望是如下

&lt;table class=&quot;relative-table&quot; style

以下是我的Java代码：



		File file = new File(&quot;C:\\Users\\table.xml&quot;);
		Document doc;	
		try {
			InputStream stream = new FileInputStream(file); 
			doc = Jsoup.parse(stream, null, &quot;UTF-8&quot;, Parser.xmlParser());
		} catch (IOException e) {
			e.printStackTrace();
		}

示例源文件

&lt;table class=\&quot;relative-table\&quot; style=\&quot;width: 100.0%;\&quot;&gt;
  &lt;colgroup&gt;
    &lt;col style=\&quot;width: 10%;\&quot; /&gt;
    &lt;col style=\&quot;width: 20%;\&quot; /&gt;
    &lt;col style=\&quot;width: 70%;\&quot; /&gt;
  &lt;/colgroup&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
   ........

英文:

I want to read a HTML content file which in a string format

The file content as below

&lt;table class=\&quot;relative-table\&quot; style

But when I inspect in java it showing as below

&lt;table class=&quot;\&amp;quot;relative-table\&amp;quot;&quot; style=

My expectation was to as below

&lt;table class=&quot;relative-table&quot; style

Below is my Java code:



		File file = new File(&quot;C:\\Users\\table.xml&quot;);
		Document doc;	
		try {
			InputStream stream = new FileInputStream(file); 
			doc = Jsoup.parse(stream, null, &quot;UTF-8&quot;, Parser.xmlParser());
		} catch (IOException e) {
			e.printStackTrace();
		}

Sample source file

&lt;table class=\&quot;relative-table\&quot; style=\&quot;width: 100.0%;\&quot;&gt;
  &lt;colgroup&gt;
    &lt;col style=\&quot;width: 10%;\&quot; /&gt;
    &lt;col style=\&quot;width: 20%;\&quot; /&gt;
    &lt;col style=\&quot;width: 70%;\&quot; /&gt;
  &lt;/colgroup&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
   ........

答案1

得分: 1

问题似乎是那些反斜杠不应该出现在文件内容中。（在Java字符串中，"... \" ... "中的反斜杠加引号只会表示引号字符。）因此，引号被视为未引用的HTML属性的一部分，并实际上被修复为HTML/XML实体&quot;。

Path file = Paths.get("C:\\Users\\table.xml");
String content = new String(Files.readAllBytes(file), StandardCharsets.UTF_8);
content = content.replace("\\\&quot;", "\&quot;");
ByteArrayInputStream bais = new ByteArrayInputStream(
        content.getBytes(StandardCharsets.UTF_8));

Document doc;
try {
    doc = Jsoup.parse(bais, null, "UTF-8", Parser.xmlParser());
} catch (IOException e) {
    e.printStackTrace();
}

这个不太美观的补丁有一个缺陷：不能确定是否还涉及到其他内容。

英文:

The problem seems that those backslashes do not belong in the file content. (In a java String "... \" ... " backslash+quote would simply represent the quote char.) Hence the quote is seen as part on an unquoted HTML attribute, and actually "repaired" as HTML/XML entity &quot;.

    Path file = Paths.get(&quot;C:\\Users\\table.xml&quot;);
    String content = new String(Files.readAllBytes(file), StandardCharsets.UTF_8);
    content = content.replace(&quot;\\\&quot;&quot;, &quot;\&quot;&quot;);
    ByteArrayInputStream bais = new ByteArrayInputStream(
            content.getBytes(StandardCharsets.UTF_8));

    Document doc;   
    try {
        doc = Jsoup.parse(bais, null, &quot;UTF-8&quot;, Parser.xmlParser());
    } catch (IOException e) {
        e.printStackTrace();
    }

This ugly patch has one flaw: one cannot be sure, that not more is concerned.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java InputStream读取字符时显示为\"而不是作为"显示在HTML内容文件中。

问题

答案1

为什么在Java N1QL中的ORDER BY没有按预期工作

Java如何比较字符串，字符串比较问题。

无法使用Selenium Java自动执行Amazon.com的退出操作。

将for循环改为while循环。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论