Java正则表达式以从Jasper文本字段中移除HTML标签样式

huangapple go评论62阅读模式
英文:

Java regex to remove styles from HTML tags for Jasper text field

问题

如标题所述,我正在寻找最安全的Java正则表达式,以从HTML标记中删除用于Jasper文本字段标记为HTML的样式,但不影响任何内容和标记的一致性。例如,对于从前端接收到的以下输入:

<p>This text contains <sub style="background-color:powderblue;">subscript</sub> text.</p> 

抱歉,没有转义引号。我发现这段代码运行良好:

String output = input.replaceAll("style=\"[^>]*\"","");

然后输出应该是:

 <p>This text contains <sub>subscript</sub> text.</p>
英文:

As stated in the title - I am looking for safest Java regex to remove styles from HTML tags intended for Jasper text field marked as HTML, but not touching any content and tags consistency. For example, for the following input received from front-end:

<p>This text contains <sub style="background-color:powderblue;">subscript</sub> text.</p> 

Sorry for not escaped quotes. I found this code works fine:

String output = input.replaceAll("style=\"[^>]*\"","");

then output should be:

 <p>This text contains <sub>subscript</sub> text.</p>

答案1

得分: 1

首先,正则表达式不适用于删除内容。 正则表达式只是检查是否与特定字符集匹配的_检查_。

除此之外,使用replaceAll的这段代码应该能起作用

String output = input.replaceAll(
        "(<[^>]+?)\\s+style\\s*=\\s*['\"][^'\"]*['\"](.*?>)", "$1$2");
英文:

First off, a regex isn't something to use if you want to remove something. A regex is purely a check if something matches a certain set of characters.

But apart from that, this code using replaceAll should do the trick

String output = input.replaceAll(
        "(<[^>]+?)\\s+style\\s*=\\s*['\"][^'\"]*['\"](.*?>)", "$1$2");

huangapple
  • 本文由 发表于 2023年4月13日 15:07:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76002576.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定