英文:
Java XSS Sanitization for nested HTML elements
问题
以下是您要翻译的内容:
我在Java中使用JSoup库对输入进行清理,以防止XSS攻击。它对于诸如<script>alert('vulnerable')</script>之类的简单输入效果很好。
示例:
String data = "<script>alert('vulnerable')</script>";
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data); //StringEscapeUtils来自apache-commons库
System.out.println(data);
输出:""
然而,如果我将输入调整为以下内容,JSoup无法清理输入。
String data = "<<b>script>alert('vulnerable');<</b>/script>";
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data);
System.out.println(data);
输出:<script>alert('vulnerable');</script>
这个输出显然仍然容易受到XSS攻击。有没有办法完全清理输入,以便从输入中删除所有HTML标签?
英文:
I am using JSoup library in Java to sanitize input to prevent XSS attacks. It works well for simple inputs like <script>alert('vulnerable')</script>.
Example:
String data = "<script>alert('vulnerable')</script>";
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data); //StringEscapeUtils from apache-commons lib
System.out.println(data);
Output: ""
However, if I tweak the input to the following, JSoup cannot sanitize the input.
String data = "<<b>script>alert('vulnerable');<</b>/script>";
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data);
System.out.println(data);
Output: <script>alert('vulnerable');</script>
This output obviously still prone to XSS attacks. Is there a way to fully sanitize the input so that all HTML tags is removed from input?
答案1
得分: 2
不确定是否这是最佳解决方案,但临时的解决方法是将原始文本解析为 Doc
对象,然后清理 Doc
元素及其所有子元素的组合文本:
String unsafe = "<<b>script>alert('vulnerable');</b>/script>";
Document doc = Jsoup.parse(unsafe);
String safe = Jsoup.clean(doc.text(), Whitelist.none());
System.out.println(safe);
等待其他人提出最佳解决方案。
英文:
Not sure if this is the best solution, but a temporary workaround would be parsing the raw text into a Doc
and then clean the combined text of the Doc
element and all its children:
String unsafe = "<<b>script>alert('vulnerable');<</b>/script>";
Document doc = Jsoup.parse(unsafe);
String safe = Jsoup.clean(doc.text(), Whitelist.none());
System.out.println(safe);
Wait for someone else to come up with the best solution.
答案2
得分: 0
问题在于您对jsoup生成的安全HTML进行了取消转义。Cleaner的输出是HTML。none
安全列表不传递任何标签,只传递文本节点作为HTML。
因此,输入为:
<<b>script>alert('vulnerable');<</b>/script>
经过Cleaner处理后的输出为:
&lt;script&gt;alert('vulnerable');&lt;/script&gt;
这样作为HTML呈现是完全安全的。请参阅https://try.jsoup.org/~hfn2nvIglfl099_dVxLQEPxekqg
只需不包括取消转义的那行代码即可。
英文:
The problem is that you are unescaping the safe HTML that jsoup has made. The output of the Cleaner is HTML. The none
safelist passes no tags, only the textnodes, as HTML.
So the input:
<<b>script>alert('vulnerable');<</b>/script>
Through the Cleaner returns:
&lt;script&gt;alert('vulnerable');&lt;/script&gt;
which is perfectly safe for presenting as HTML. See https://try.jsoup.org/~hfn2nvIglfl099_dVxLQEPxekqg
Just don't include the unescape line.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论