无法使用JavaScript中的DOMParser验证包含<和>字符的HTML内容。

huangapple go评论55阅读模式
英文:

Can't validate HTML content if it contains < and > characters with DOMParser in Javascript

问题

这是 JavaScript 代码,用于检查用户插入的 HTML 是否有效。但是在处理包含 "<" 和 ">" 字符的情况时,出现了问题。应该如何处理?

英文:

I'm trying to build an HTML editor validator in Javascript.

Here is the JS code that checks if the HTML inserted by the user is valid or not:

let parser = new DOMParser();
let doc = parser.parseFromString(&#39;&lt;html&gt;&#39; + myHTML + &#39;&lt;/html&gt;&#39;, &#39;application/xml&#39;);
let errorNode = doc.querySelector(&#39;parsererror&#39;);
if (errorNode) {
  return &#39;HTML is not valid&#39;;
} else {
  return &#39;HTML is valid&#39;;
}

Example that returns "HTML is valid": -> above code works as expected here

&lt;div&gt;This HTML is valid&lt;/div&gt;

Example that returns "HTML is not valid": -> above code works as expected here

 &lt;div&gt;This HTML is not valid&lt;/divv&gt;

Instead in these cases, my code doesn't work as I expected.
It considers "<" and ">" characters as HTML tag and then it returns error:

&lt;div&gt; 2 &lt; 3 &lt;/div&gt; -> this is a valid HTML but above code returns "HTML is not valid".

&lt;div&gt; 4 &gt; 3 &lt;/div&gt; -> this is a valid HTML but above code returns "HTML is not valid".

How should I go about it?

答案1

得分: 1

I wouldn't call what DOMParser.parseFromString() does "validation"; rather, it applies HTML5's fallback parsing rules to reach a result -- any result -- that's hopefully consistent across browsers. Note <html><div>...</div></html> is not a valid, complete HTML document since at least a <title> element is required (as part of <head> which can be omitted) when the input to parseFromString() is supposed to be a complete document rather than a document fragment according to https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString. The whole story of HTML validation is way more complicated, and also depends on the version of HTML you're targeting. You can read about validating HTML using SGML on eg. https://validator.w3.org/docs/sgml.html, https://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tutorial.html, and https://sgmljs.net/docs/html200129.html, and you can also check out W3C's Java-based nu validator. Anyway, to answer your question, try escaping (replacing) < characters by &lt; entity references.

英文:

I wouldn't call what DOMParser.parseFromString() does "validation"; rather, it applies HTML5's fallback parsing rules to reach a result -- any result -- that's hopefully consistent across browsers. Note &lt;html&gt;&lt;div&gt;...&lt;/div&gt;&lt;/html&gt; is not a valid, complete HTML document since at least a &lt;title&gt; element is required (as part of &lt;head&gt; which can be omitted) when the input to parseFromString() is supposed to be a complete document rather than a document fragment according to <https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString>. The whole story of HTML validation is way more complicated, and also depends on the version of HTML you're targeting. You can read about validating HTML using SGML on eg. <https://validator.w3.org/docs/sgml.html>, <https://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tutorial.html>, and <https://sgmljs.net/docs/html200129.html>, and you can also check out W3C's Java-based nu validator.

Anyway, to answer your question, try escaping (replacing) &lt; characters by &amp;lt; entity references.

huangapple
  • 本文由 发表于 2023年5月17日 16:37:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76270090.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定