英文:
Replace ASCII codes and HTML tags in Java
问题
如何在不使用 `StringEscapeUtils` 的情况下实现以下期望结果?
public class Main {
public static void main(String[] args) throws Exception {
String str = "<p><b>Send FWB <br><br> (if AWB has COU SHC, <br> if ticked , will send FWB)</b></p>";
str = str.replaceAll("\\<.*?\\>", "");
System.out.println("After removing HTML Tags: " + str);
}
}
**当前结果:**
After removing HTML Tags: Send FWB (if AWB has COU SHC, if ticked , will send FWB)
**期望结果:**
After removing HTML Tags: Send FWB if AWB has COU SHC, if ticked , will send FWB;
已经检查过:
https://stackoverflow.com/questions/994331/how-to-unescape-html-character-entities-in-java
<hr>
**注:** 这只是一个示例,输入可能会有所不同。
英文:
How can i achieve below expecting results without using StringEscapeUtils ?
public class Main {
public static void main(String[] args) throws Exception {
String str = "<p><b>Send FWB <br><br> &#40;if AWB has COU SHC, <br> if ticked , will send FWB&#41;</b></p>";
str = str.replaceAll("\\<.*?\\>", "");
System.out.println("After removing HTML Tags: " + str);
}
}
Current Results:
After removing HTML Tags: Send FWB &#40;if AWB has COU SHC, if ticked , will send FWB&#41;
Expecting Results:
After removing HTML Tags: Send FWB if AWB has COU SHC, if ticked , will send FWB;
Already checked:
https://stackoverflow.com/questions/994331/how-to-unescape-html-character-entities-in-java
<hr>
PS: This is just a sample example, input may vary.
答案1
得分: 1
你的正则表达式是用于匹配 HTML 标签 <something> 的,但 HTML 实体将不会被匹配。它们的模式类似于 &.*?;,而你并没有进行替换。
以下代码应该能解决你的问题:
str = str.replaceAll("&lt;.*?&gt;|&.*?;", "");
如果你想在沙盒中尝试这个正则表达式,可以访问 regxr.com 并使用 (\<.*?\>)|(&.*?;),括号可以使两个不同的捕获组在工具中更易于识别,但在你的代码中不是必需的。请注意,在那个沙盒游乐场上,\ 不需要转义,但在你的代码中需要转义,因为它在一个字符串中。
英文:
Your regexp is for html tags <something> would be matched byt the html entities will not be matched. Their pattern is something like &.*?; Which you are not replacing.
this should solve your trouble:
str = str.replaceAll("\\<.*?\\>|&.*?;", "");
If you want to experiment with this in a sandbox, try regxr.com and use (\<.*?\>)|(&.*?;) the brackets make the two different capturing groups easy to identify on the tool and are not needed in your code. note that the \does not need to be escaped on that sandbox playground, but it has to be in your code, since it's in a string.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论