2020年8月26日 14:58:27go评论110阅读模式

英文:

Replace ASCII codes and HTML tags in Java

问题

如何在不使用 `StringEscapeUtils` 的情况下实现以下期望结果？
    public class Main {
        public static void main(String[] args) throws Exception {
          String str = "<p><b>Send FWB <br><br> &#40;if AWB has COU SHC, <br> if ticked , will send FWB&#41;</b></p>";
          str = str.replaceAll("\\&lt;.*?\\&gt;", "");
          System.out.println("After removing HTML Tags: " + str);
        }
    }
**当前结果：**
    After removing HTML Tags: Send FWB  &#40;if AWB has COU SHC,  if ticked , will send FWB&#41;
**期望结果：**
    After removing HTML Tags: Send FWB  if AWB has COU SHC,  if ticked , will send FWB;
已经检查过：
https://stackoverflow.com/questions/994331/how-to-unescape-html-character-entities-in-java
<hr>
**注：** 这只是一个示例，输入可能会有所不同。

英文:

How can i achieve below expecting results without using StringEscapeUtils ?

public class Main {
    public static void main(String[] args) throws Exception {
      String str = &quot;&lt;p&gt;&lt;b&gt;Send FWB &lt;br&gt;&lt;br&gt; &amp;#40;if AWB has COU SHC, &lt;br&gt; if ticked , will send FWB&amp;#41;&lt;/b&gt;&lt;/p&gt;&quot;;
      str = str.replaceAll(&quot;\\&lt;.*?\\&gt;&quot;, &quot;&quot;);
      System.out.println(&quot;After removing HTML Tags: &quot; + str);
    }
}

Current Results:

After removing HTML Tags: Send FWB  &amp;#40;if AWB has COU SHC,  if ticked , will send FWB&amp;#41;

Expecting Results:

After removing HTML Tags: Send FWB  if AWB has COU SHC,  if ticked , will send FWB;

Already checked:
https://stackoverflow.com/questions/994331/how-to-unescape-html-character-entities-in-java

<hr>

PS: This is just a sample example, input may vary.

答案1

得分: 1

你的正则表达式是用于匹配 HTML 标签 <something> 的，但 HTML 实体将不会被匹配。它们的模式类似于 &.*?;，而你并没有进行替换。

以下代码应该能解决你的问题：

str = str.replaceAll("&amp;lt;.*?&amp;gt;|&amp;.*?;", "");

如果你想在沙盒中尝试这个正则表达式，可以访问 regxr.com 并使用 (\<.*?\>)|(&.*?;)，括号可以使两个不同的捕获组在工具中更易于识别，但在你的代码中不是必需的。请注意，在那个沙盒游乐场上，\ 不需要转义，但在你的代码中需要转义，因为它在一个字符串中。

英文:

Your regexp is for html tags <something> would be matched byt the html entities will not be matched. Their pattern is something like &.*?; Which you are not replacing.

this should solve your trouble:

str = str.replaceAll(&quot;\\&lt;.*?\\&gt;|&amp;.*?;&quot;, &quot;&quot;);

If you want to experiment with this in a sandbox, try regxr.com and use (\<.*?\>)|(&.*?;) the brackets make the two different capturing groups easy to identify on the tool and are not needed in your code. note that the \does not need to be escaped on that sandbox playground, but it has to be in your code, since it's in a string.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

替换Java中的ASCII码和HTML标签

问题

答案1

如何使用Java中的递归打印出附上图像中所示的图案。

创建一个循环，达到末端然后反向返回。

Gradle任务间歇性失败

How to setting PropertySourcesPlaceholderConfigurer Auto SetLocations for External Configuration in Java Spring Boot

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。