坏的HTML过滤正则表达式

huangapple go评论54阅读模式
英文:

Bad HTML filtering regexp

问题

这个正则表达式只解析 --> 而不解析 --!> 作为HTML注释结束标签。 CodeQL 建议这样做的原因是因为在HTML中,常见的注释结束标签是 -->,而 --!> 不是标准的HTML注释结束标签。所以,如果你的代码中使用了 --!>,它可能不会被正确地解析为HTML注释的结束,这可能会导致问题。如果你确实需要解析 --!>,你可能需要修改正则表达式来处理这种情况。

英文:

do you know why CodeQL suggest this? what is wrong in the code?

values = vals.replace(/(^\s*<!--)|(-->\s*$)|\s+/g, '').split(',');

This regular expression only parses --> and not --!> as a HTML comment end tag.
CodeQL

坏的HTML过滤正则表达式

答案1

得分: 0

不是说它应该寻找`--!>`,但如果您需要解释正则表达式,这里有一个。

vals.replace(/(^\s*<!--)|(-->\s*$)|\s+/g, '').split(',');

`(^\s*<!--)` 寻找以0个或多个空白字符和`<!--`开头的行。

`(-->\s*$)` 寻找`-->`后面跟着0个或多个空白字符。

所以您需要添加:`(--!>\s*$)` 来寻找`--!>` 字符。

vals.replace(/(^\s*<!--)|(-->\s*$)|(--!>\s*$)|\s+/g, '').split(',');

虽然我认为我从未见过`--!>`,所以也许要仔细检查您是否真的需要它。https://www.w3schools.com/tags/tag_comment.asp

这是学习正则表达式的良好起点:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions
英文:

Not saying it should look for --!>, however if you need an explanation of your regular expression here it is.

vals.replace(/(^\s*<!--)|(-->\s*$)|\s+/g, '').split(',');

(^\s*<!--) looks for the lines starting with 0 or more whitespace characters and your <!--.

(-->\s*$) looks for --> followed by 0 or more white space characters.

So you need to add: (--!>\s*$) to look for the --!> characters.

vals.replace(/(^\s*<!--)|(-->\s*$)|(--!>\s*$)|\s+/g, '').split(',');

Though I don't think I've ever seen --!> so maybe double check you really do need it. https://www.w3schools.com/tags/tag_comment.asp

This is a good starting point for learning regular expressions: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions

答案2

得分: 0

通常,GitHub 上的 CodeQL 代码扫描应该包括查询的详细说明(也许这只是在您的截图中被截断了)。在这种情况下,这似乎是Bad HTML filtering regexp。其描述如下:

> 可以使用正则表达式匹配一些单个 HTML 标签(使用正则表达式解析一般的 HTML 是不可能的)。但是,如果正则表达式写得不好,可能会绕过它,从而导致跨站脚本或其他安全问题。
>
> 一些错误是由于浏览器具有非常宽容的 HTML 解析器而引起的,通常会呈现包含语法错误的无效 HTML。试图匹配 HTML 的正则表达式还应该识别包含此类语法错误的标签。

在示例部分中,它说:
> 其他边缘情况包括 HTML 注释可以以 --!> 结尾[...]。

--!> 的这种边缘情况实际上在 HTML 规范中被描述为 incorrectly-closed-comment 错误
> 如果解析器遇到由“--!>”代码点序列关闭的注释,则会发生此错误。解析器将这种注释视为以“-->”代码点序列正确关闭的注释。

因此,总结一下,如果您选择使用正则表达式来解析 HTML 注释,出于安全原因,您还应考虑将 --!> 视为 HTML 注释的结束。

英文:

Normally the CodeQL code scanning on GitHub should also include the detailed explanation of the query (maybe this is just cut off on your screenshot). In this case this seems to be Bad HTML filtering regexp. Its description says:

> It is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues.
>
> Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.

And in the Example section it says:
> Other corner cases include that HTML comments can end with --!> [...]

That corner case for --!> is actually described in the HTML specification as incorrectly-closed-comment error:
> This error occurs if the parser encounters a comment that is closed by the "--!>" code point sequence. The parser treats such comments as if they are correctly closed by the "-->" code point sequence.

So summarized, if you choose to use regex for parsing HTML comments then you should for security reasons also consider treating --!> as end of an HTML comment as well.

huangapple
  • 本文由 发表于 2023年7月11日 03:33:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76656782.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定