英文:
Regex to match html tags excluding some
问题
我想移除所有不在我的列表中的HTML开/闭标签(包括属性)。
const allowedTags = ['a', 'b', 'i', 's', 'u', 'sup', 'sub', 'strong', 'cite', 'code', 'del', 'em'];
示例
输入
<b><i><img src="http://example.com"/>Test<strong>Passed</strong><span>without any errors</span><a href="http://example.com">click here</a></b></i>
输出
<b><i>Test<strong>Passed</strong>without any errors<a href="http://example.com">click here</a></b></i>
我尝试了以下方法
let regex = new RegExp(`<\/?(?!${allowedTags.join('|')})[^>]*>`, 'g');
str.replace(regex, '');
我想要实现的目标是
<
以<
开头\/?
可能包含/
或不包含/
(?!${allowedTags.join('|')})
不能包含列表中的任何标签(负向预查)[^>]*
匹配任何不是>
的字符(以匹配属性)>
以>
结尾
英文:
I want remove all html opening/closing tags (including attributes) that are not present in my list
const allowedTags = ['a', 'b', 'i', 's', 'u', 'sup', 'sub', 'strong', 'cite', 'code', 'del', 'em'];
Example
input
<b><i><img src="http://example.com"/>Test<strong>Passed</strong><span>without any errors</span><a href="http://example.com">click here</a></b></i>
output
<b><i>Test<strong>Passed</strong>without any errors<a href="http://example.com">click here</a></b></i>
I tried this
let regex = new RegExp(`<\/?(?!${allowedTags.join('|')})[^>]*>`, 'g')
str.replace(regex, '')
What I wanted to achieve
<
starts with<
\/?
may or not contain the/
(?!${allowedTags.join('|')})
none of those tags (new to Negative Lookahead)[^>]*
anything not a>
(to get the attributes)>
ends with>
答案1
得分: 0
如前面评论中所讨论的,不要尝试使用正则表达式来处理这个问题。这会导致问题和严重的安全漏洞。
看起来sanitize-html或DOMPurify可能是您的最佳选择。
如果您的后端在另一种技术上运行,比如PHP,那么不要忘记还应该在服务器端使用类似Symfony的HTML清理器这样的库进行清理,因为伪造HTTP Post请求非常容易。
前端的JavaScript清理也应该进行,以保护用户免受恶意HTML的复制粘贴。
英文:
As discussed in the comments before, don't try to do that with a regular expression. You'll lead to problems and severe security holes.
Seems that sanitize-html or DOMPurify will be the best option for your case.
If your backend is running on another techno, such as PHP, then don't forget that you should also sanitize server-side with a lib like Symfony's HTML sanitizer, as spoofing an HTTP Post request is very easy to do.
The JavaScript sanitizing on the front side should also be done to protect the user from a copy-paste of evil HTML.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论