2023年7月11日 03:33:11go评论93阅读模式

英文:

Bad HTML filtering regexp

问题

这个正则表达式只解析 --> 而不解析 --!> 作为HTML注释结束标签。 CodeQL 建议这样做的原因是因为在HTML中，常见的注释结束标签是 -->，而 --!> 不是标准的HTML注释结束标签。所以，如果你的代码中使用了 --!>，它可能不会被正确地解析为HTML注释的结束，这可能会导致问题。如果你确实需要解析 --!>，你可能需要修改正则表达式来处理这种情况。

英文:

do you know why CodeQL suggest this? what is wrong in the code?

values = vals.replace(/(^\s*\s*$)|\s+/g, '').split(',');

This regular expression only parses --> and not --!> as a HTML comment end tag.
CodeQL

坏的HTML过滤正则表达式

答案1

得分: 0

不是说它应该寻找`--!&gt;`，但如果您需要解释正则表达式，这里有一个。
vals.replace(/(^\s*&lt;!--)|(--&gt;\s*$)|\s+/g, '').split(',');
`(^\s*&lt;!--)` 寻找以0个或多个空白字符和`&lt;!--`开头的行。
`(--&gt;\s*$)` 寻找`--&gt;`后面跟着0个或多个空白字符。
所以您需要添加：`(--!&gt;\s*$)` 来寻找`--!&gt;` 字符。
vals.replace(/(^\s*&lt;!--)|(--&gt;\s*$)|(--!&gt;\s*$)|\s+/g, '').split(',');
虽然我认为我从未见过`--!&gt;`，所以也许要仔细检查您是否真的需要它。https://www.w3schools.com/tags/tag_comment.asp
这是学习正则表达式的良好起点：https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions

英文:

Not saying it should look for --!>, however if you need an explanation of your regular expression here it is.

vals.replace(/(^\s*&lt;!--)|(--&gt;\s*$)|\s+/g, &#39;&#39;).split(&#39;,&#39;);

(^\s*<!--) looks for the lines starting with 0 or more whitespace characters and your <!--.

(-->\s*$) looks for --> followed by 0 or more white space characters.

So you need to add: (--!>\s*$) to look for the --!> characters.

vals.replace(/(^\s*&lt;!--)|(--&gt;\s*$)|(--!&gt;\s*$)|\s+/g, &#39;&#39;).split(&#39;,&#39;);

Though I don't think I've ever seen --!> so maybe double check you really do need it. https://www.w3schools.com/tags/tag_comment.asp

This is a good starting point for learning regular expressions: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions

答案2

得分: 0

通常，GitHub 上的 CodeQL 代码扫描应该包括查询的详细说明（也许这只是在您的截图中被截断了）。在这种情况下，这似乎是Bad HTML filtering regexp。其描述如下：

> 可以使用正则表达式匹配一些单个 HTML 标签（使用正则表达式解析一般的 HTML 是不可能的）。但是，如果正则表达式写得不好，可能会绕过它，从而导致跨站脚本或其他安全问题。
>
> 一些错误是由于浏览器具有非常宽容的 HTML 解析器而引起的，通常会呈现包含语法错误的无效 HTML。试图匹配 HTML 的正则表达式还应该识别包含此类语法错误的标签。

在示例部分中，它说：
> 其他边缘情况包括 HTML 注释可以以 --!> 结尾[...]。

--!> 的这种边缘情况实际上在 HTML 规范中被描述为 incorrectly-closed-comment 错误：
> 如果解析器遇到由“--!>”代码点序列关闭的注释，则会发生此错误。解析器将这种注释视为以“-->”代码点序列正确关闭的注释。

因此，总结一下，如果您选择使用正则表达式来解析 HTML 注释，出于安全原因，您还应考虑将 --!> 视为 HTML 注释的结束。

英文:

Normally the CodeQL code scanning on GitHub should also include the detailed explanation of the query (maybe this is just cut off on your screenshot). In this case this seems to be Bad HTML filtering regexp. Its description says:

> It is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues.
>
> Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.

And in the Example section it says:
> Other corner cases include that HTML comments can end with --!> [...]

That corner case for --!> is actually described in the HTML specification as incorrectly-closed-comment error:
> This error occurs if the parser encounters a comment that is closed by the "--!>" code point sequence. The parser treats such comments as if they are correctly closed by the "-->" code point sequence.

So summarized, if you choose to use regex for parsing HTML comments then you should for security reasons also consider treating --!> as end of an HTML comment as well.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

坏的HTML过滤正则表达式

问题

答案1

答案2

正则表达式用于匹配多个字符串条件

可以使用HTML5 VideoEncoder编码为YUV422吗？

根据大写字母分割，但不在下划线之间分割。

使用正则表达式从字符串列表中提取信息。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。