正则表达式用于匹配HTML标签,但不包括某些标签。

huangapple go评论72阅读模式
英文:

Regex to match html tags excluding some

问题

我想移除所有不在我的列表中的HTML开/闭标签(包括属性)。

const allowedTags = ['a', 'b', 'i', 's', 'u', 'sup', 'sub', 'strong', 'cite', 'code', 'del', 'em'];

示例

输入

<b><i><img src="http://example.com"/>Test<strong>Passed</strong><span>without any errors</span><a href="http://example.com">click here</a></b></i>

输出

<b><i>Test<strong>Passed</strong>without any errors<a href="http://example.com">click here</a></b></i>

我尝试了以下方法

let regex = new RegExp(`<\/?(?!${allowedTags.join('|')})[^>]*>`, 'g');
str.replace(regex, '');

我想要实现的目标是

  • << 开头
  • \/? 可能包含 / 或不包含 /
  • (?!${allowedTags.join('|')}) 不能包含列表中的任何标签(负向预查)
  • [^>]* 匹配任何不是 > 的字符(以匹配属性)
  • >> 结尾
英文:

I want remove all html opening/closing tags (including attributes) that are not present in my list

const allowedTags = ['a', 'b', 'i', 's', 'u', 'sup', 'sub', 'strong', 'cite', 'code', 'del', 'em'];

Example

input

<b><i><img src="http://example.com"/>Test<strong>Passed</strong><span>without any errors</span><a href="http://example.com">click here</a></b></i>

output

<b><i>Test<strong>Passed</strong>without any errors<a href="http://example.com">click here</a></b></i>

I tried this

let regex = new RegExp(`<\/?(?!${allowedTags.join('|')})[^>]*>`, 'g')
str.replace(regex, '')

What I wanted to achieve

  • < starts with <
  • \/? may or not contain the /
  • (?!${allowedTags.join('|')}) none of those tags (new to Negative Lookahead)
  • [^>]* anything not a > (to get the attributes)
  • > ends with >

答案1

得分: 0

如前面评论中所讨论的,不要尝试使用正则表达式来处理这个问题。这会导致问题和严重的安全漏洞。

看起来sanitize-htmlDOMPurify可能是您的最佳选择。

如果您的后端在另一种技术上运行,比如PHP,那么不要忘记还应该在服务器端使用类似Symfony的HTML清理器这样的库进行清理,因为伪造HTTP Post请求非常容易。

前端的JavaScript清理也应该进行,以保护用户免受恶意HTML的复制粘贴。

英文:

As discussed in the comments before, don't try to do that with a regular expression. You'll lead to problems and severe security holes.

Seems that sanitize-html or DOMPurify will be the best option for your case.

If your backend is running on another techno, such as PHP, then don't forget that you should also sanitize server-side with a lib like Symfony's HTML sanitizer, as spoofing an HTTP Post request is very easy to do.

The JavaScript sanitizing on the front side should also be done to protect the user from a copy-paste of evil HTML.

huangapple
  • 本文由 发表于 2023年3月3日 23:07:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628744.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定