Java中针对嵌套HTML元素的XSS清理

huangapple go评论73阅读模式
英文:

Java XSS Sanitization for nested HTML elements

问题

以下是您要翻译的内容:

我在Java中使用JSoup库对输入进行清理,以防止XSS攻击。它对于诸如<script>alert('vulnerable')</script>之类的简单输入效果很好。

示例:

String data = &quot;&lt;script&gt;alert(&#39;vulnerable&#39;)&lt;/script&gt;&quot;;
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data); //StringEscapeUtils来自apache-commons库
System.out.println(data);

输出:&quot;&quot;

然而,如果我将输入调整为以下内容,JSoup无法清理输入。

String data = &quot;&lt;&lt;b&gt;script&gt;alert(&#39;vulnerable&#39;);&lt;&lt;/b&gt;/script&gt;&quot;;
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data);
System.out.println(data);

输出:&lt;script&gt;alert(&#39;vulnerable&#39;);&lt;/script&gt;

这个输出显然仍然容易受到XSS攻击。有没有办法完全清理输入,以便从输入中删除所有HTML标签?

英文:

I am using JSoup library in Java to sanitize input to prevent XSS attacks. It works well for simple inputs like <script>alert('vulnerable')</script>.

Example:

String data = &quot;&lt;script&gt;alert(&#39;vulnerable&#39;)&lt;/script&gt;&quot;;
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data); //StringEscapeUtils from apache-commons lib
System.out.println(data);

Output: &quot;&quot;

However, if I tweak the input to the following, JSoup cannot sanitize the input.

String data = &quot;&lt;&lt;b&gt;script&gt;alert(&#39;vulnerable&#39;);&lt;&lt;/b&gt;/script&gt;&quot;;
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data);
System.out.println(data);

Output: &lt;script&gt;alert(&#39;vulnerable&#39;);&lt;/script&gt;

This output obviously still prone to XSS attacks. Is there a way to fully sanitize the input so that all HTML tags is removed from input?

答案1

得分: 2

不确定是否这是最佳解决方案,但临时的解决方法是将原始文本解析为 Doc 对象,然后清理 Doc 元素及其所有子元素的组合文本:

String unsafe = "<<b>script>alert('vulnerable');</b>/script>";
Document doc = Jsoup.parse(unsafe);
String safe = Jsoup.clean(doc.text(), Whitelist.none());
System.out.println(safe);

等待其他人提出最佳解决方案。

英文:

Not sure if this is the best solution, but a temporary workaround would be parsing the raw text into a Doc and then clean the combined text of the Doc element and all its children:

String unsafe = &quot;&lt;&lt;b&gt;script&gt;alert(&#39;vulnerable&#39;);&lt;&lt;/b&gt;/script&gt;&quot;;
Document doc = Jsoup.parse(unsafe);
String safe = Jsoup.clean(doc.text(), Whitelist.none());
System.out.println(safe);

Wait for someone else to come up with the best solution.

答案2

得分: 0

问题在于您对jsoup生成的安全HTML进行了取消转义。Cleaner的输出是HTML。none安全列表不传递任何标签,只传递文本节点作为HTML。

因此,输入为:

&lt;&lt;b&gt;script&gt;alert(&#39;vulnerable&#39;);&lt;&lt;/b&gt;/script&gt;

经过Cleaner处理后的输出为:

&amp;lt;script&amp;gt;alert(&#39;vulnerable&#39;);&amp;lt;/script&amp;gt;

这样作为HTML呈现是完全安全的。请参阅https://try.jsoup.org/~hfn2nvIglfl099_dVxLQEPxekqg

只需不包括取消转义的那行代码即可。

英文:

The problem is that you are unescaping the safe HTML that jsoup has made. The output of the Cleaner is HTML. The none safelist passes no tags, only the textnodes, as HTML.

So the input:

&lt;&lt;b&gt;script&gt;alert(&#39;vulnerable&#39;);&lt;&lt;/b&gt;/script&gt;

Through the Cleaner returns:

&amp;lt;script&amp;gt;alert(&#39;vulnerable&#39;);&amp;lt;/script&amp;gt;

which is perfectly safe for presenting as HTML. See https://try.jsoup.org/~hfn2nvIglfl099_dVxLQEPxekqg

Just don't include the unescape line.

huangapple
  • 本文由 发表于 2020年10月4日 08:02:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/64190046.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定