2023年5月17日 16:37:31go评论100阅读模式

英文:

Can't validate HTML content if it contains < and > characters with DOMParser in Javascript

问题

这是 JavaScript 代码，用于检查用户插入的 HTML 是否有效。但是在处理包含 "<" 和 ">" 字符的情况时，出现了问题。应该如何处理？

英文:

I'm trying to build an HTML editor validator in Javascript.

Here is the JS code that checks if the HTML inserted by the user is valid or not:

let parser = new DOMParser();
let doc = parser.parseFromString(&#39;&lt;html&gt;&#39; + myHTML + &#39;&lt;/html&gt;&#39;, &#39;application/xml&#39;);
let errorNode = doc.querySelector(&#39;parsererror&#39;);
if (errorNode) {
  return &#39;HTML is not valid&#39;;
} else {
  return &#39;HTML is valid&#39;;
}

Example that returns "HTML is valid": -> above code works as expected here

&lt;div&gt;This HTML is valid&lt;/div&gt;

Example that returns "HTML is not valid": -> above code works as expected here

 &lt;div&gt;This HTML is not valid&lt;/divv&gt;

Instead in these cases, my code doesn't work as I expected.
It considers "<" and ">" characters as HTML tag and then it returns error:

<div> 2 < 3 </div> -> this is a valid HTML but above code returns "HTML is not valid".

<div> 4 > 3 </div> -> this is a valid HTML but above code returns "HTML is not valid".

How should I go about it?

答案1

得分: 1

I wouldn't call what DOMParser.parseFromString() does "validation"; rather, it applies HTML5's fallback parsing rules to reach a result -- any result -- that's hopefully consistent across browsers. Note <html><div>...</div></html> is not a valid, complete HTML document since at least a <title> element is required (as part of <head> which can be omitted) when the input to parseFromString() is supposed to be a complete document rather than a document fragment according to https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString. The whole story of HTML validation is way more complicated, and also depends on the version of HTML you're targeting. You can read about validating HTML using SGML on eg. https://validator.w3.org/docs/sgml.html, https://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tutorial.html, and https://sgmljs.net/docs/html200129.html, and you can also check out W3C's Java-based nu validator. Anyway, to answer your question, try escaping (replacing) < characters by < entity references.

英文:

I wouldn't call what DOMParser.parseFromString() does "validation"; rather, it applies HTML5's fallback parsing rules to reach a result -- any result -- that's hopefully consistent across browsers. Note <html><div>...</div></html> is not a valid, complete HTML document since at least a <title> element is required (as part of <head> which can be omitted) when the input to parseFromString() is supposed to be a complete document rather than a document fragment according to <https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString>. The whole story of HTML validation is way more complicated, and also depends on the version of HTML you're targeting. You can read about validating HTML using SGML on eg. <https://validator.w3.org/docs/sgml.html>, <https://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tutorial.html>, and <https://sgmljs.net/docs/html200129.html>, and you can also check out W3C's Java-based nu validator.

Anyway, to answer your question, try escaping (replacing) < characters by &lt; entity references.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法使用JavaScript中的DOMParser验证包含<和>字符的HTML内容。

问题

答案1

React MUI Datagrid标题的顶部中央对齐

Is there a simpler way to create type safe javascript classes that initialize sequelize models?

reCAPTCHA v3 使用表单提交的 g-recaptcha

在社交媒体上的帖子内容中显示多张图片。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。