将HTML转换为文本在Java中

huangapple go评论49阅读模式
英文:

Convert HTML to text in Java

问题

> All,
>
> Please find attached file for this week.
>
> Thanks,
> Support Team

英文:

I have a Java string like this:

String str = "<table><tr><td>ALL,</td></tr><tr><td></td></tr><tr><td> Please find attached file for this week<tr><td></td></tr><tr><td>Thanks</td></tr><tr><td>Support Team</td></tr>";

I want output like this:

> All,
>
> Please find attached file for this week.
>
> Thanks,
> Support Team

答案1

得分: 2

你应该真正使用一个合适的HTML解析器,但如果你想要一些快速且简单的方法并且你的HTML格式良好,你可以使用javax.swing.text.html包中的内容:

    public static String stripTags(String content) throws Exception {
        String result = null;
        HTMLEditorKit kit = new HTMLEditorKit();
        InputStream in = new ByteArrayInputStream(content.getBytes());
        Document doc = new HTMLDocument();
        kit.read(in, doc, 0);
        result = doc.getText(0, doc.getLength());

        return result;
    }
英文:

You should really use a proper html parser, but if you want something quick and dirty and your html is well-formed you can use something from package javax.swing.text.html:

    public static String stripTags(String content) throws Exception {
        String result = null;
        HTMLEditorKit kit = new HTMLEditorKit();
        InputStream in = new ByteArrayInputStream(content.getBytes());
        Document doc = new HTMLDocument();
        kit.read(in, doc, 0);
        result = doc.getText(0, doc.getLength());

        return result;
    }

huangapple
  • 本文由 发表于 2023年6月22日 17:48:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76530618.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定