什么是从.eml文件中提取文本的最佳方法?

huangapple go评论72阅读模式
英文:

What is the best way to get text from .eml file?

问题

尝试从我的本地驱动器上的几个 eml 文件中获取发件人、收件人、主题和消息正文。现在我尝试使用 Apache Commons Email,但有时会无错误地循环。以下是我的代码,它应该从 eml 文件中获取文本并将其保存为 txt 文件:

            MimeMessage mimeMessage = MimeMessageUtils.createMimeMessage(null, file);
            MimeMessageParser parser = new MimeMessageParser(mimeMessage);

            if (parser.parse().hasPlainContent()) {
                //尝试获取消息的文本
                try (FileWriter writer = new FileWriter(txtName)) {
                    writeHeaders(writer, parser);
                    writer.write(parser.parse().getPlainContent());
                } catch (IOException e) {
                    e.printStackTrace();
                }
            } else if (parser.parse().hasHtmlContent()) {
                try (FileWriter writer = new FileWriter(txtName)) {
                    writeHeaders(writer, parser);
                    String text = Jsoup.parse(parser.parse().getHtmlContent()).text();
                    writer.write(text);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }

此外,这是 writeHeaders 方法:

    private void writeHeaders(FileWriter writer, MimeMessageParser parser) throws Exception {
        writer.write("发件人:" + parser.getFrom() + "\n");
        writer.write("收件人:" + parser.getTo() + "\n");
        writer.write("主题:" + parser.getSubject() + "\n");
        writer.write("消息:" + "\n" + "\n");
    }

这是获取附件的方法:

          if (parser.parse().hasAttachments()) {
                //从 eml 获取并保存附件
                List<DataSource> attachments = parser.parse().getAttachmentList();
                for (DataSource attachment : attachments) {
                    if (attachment.getName() != null && !attachment.getName().isEmpty()) {
                        try (InputStream is = attachment.getInputStream()) {
                            File save = new File(saveDir + File.separator + attachment.getName());
                            FileOutputStream fos = new FileOutputStream(save);
                            byte[] buf = new byte[4096];
                            int bytesRead;
                            while ((bytesRead = is.read(buf)) != -1) {
                                fos.write(buf, 0, bytesRead);
                            }
                            fos.close();
                            if (save.getName().endsWith("eml")) {
                                parseEml(save, count);
                            }
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
英文:

I try to get to, from, topic and message body from several eml files which are on my local drive. Now I've tried to use Apache Commons Email, but sometimes it loops with no errors. Here is my code which supposed to get text from eml and save it to txt:

            MimeMessage mimeMessage = MimeMessageUtils.createMimeMessage(null, file);
            MimeMessageParser parser = new MimeMessageParser(mimeMessage);

            if (parser.parse().hasPlainContent()) {
                //Trying to get text of the message
                try (FileWriter writer = new FileWriter(txtName)) {
                    writeHeaders(writer, parser);
                    writer.write(parser.parse().getPlainContent());
                } catch (IOException e) {
                    e.printStackTrace();
                }
            } else if (parser.parse().hasHtmlContent()) {
                try (FileWriter writer = new FileWriter(txtName)) {
                    writeHeaders(writer, parser);
                    String text = Jsoup.parse(parser.parse().getHtmlContent()).text();
                    writer.write(text);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }

Also here is writeHeaders method:

    private void writeHeaders(FileWriter writer, MimeMessageParser parser) throws Exception {
        writer.write(&quot;From :&quot; + parser.getFrom() + &quot;\n&quot;);
        writer.write(&quot;To:&quot; + parser.getTo() + &quot;\n&quot;);
        writer.write(&quot;Subject:&quot; + parser.getSubject() + &quot;\n&quot;);
        writer.write(&quot;Message:&quot; + &quot;\n&quot; + &quot;\n&quot;);
    }

And here is method to get attachments:

          if (parser.parse().hasAttachments()) {
                //Getting and saving attachments from eml
                List&lt;DataSource&gt; attachments = parser.parse().getAttachmentList();
                for (DataSource attachment : attachments) {
                    if (attachment.getName() != null &amp;&amp; !attachment.getName().isEmpty()) {
                        try (InputStream is = attachment.getInputStream()) {
                            File save = new File(saveDir + File.separator + attachment.getName());
                            FileOutputStream fos = new FileOutputStream(save);
                            byte[] buf = new byte[4096];
                            int bytesRead;
                            while ((bytesRead = is.read(buf)) != -1) {
                                fos.write(buf, 0, bytesRead);
                            }
                            fos.close();
                            if (save.getName().endsWith(&quot;eml&quot;)) {
                                parseEml(save, count);
                            }
                        } catch (Exception e) {
                            e.printStackTrace();
                        }

So, maybe there are any easier ways to get text and attachments?

答案1

得分: 3

Yes much easier. Simple Java Mail (Github) can read .eml files and makes the content very accessible. If you find something like a looping error there too (unlikely), I'll be happy to assist you there (I actively maintain Simple Java Mail):

Email email = EmailConverter.emlToEmail(emlFile);

email.getFromRecipient();
email.getSubject();
email.getPlainText();
email.getHTMLText();
email.getAttachments();
email.getEmbeddedImages();
email.getHeaders();
// etc. etc.

Also supports S/MIME encrypted emails (if you have the required certificates to decrypt the emails).

英文:

Yes much easier. Simple Java Mail (Github) can read .eml files and makes the content very accessible. If you find something like a looping error there too (unlikely), I'll be happy to assist you there (I actively maintain Simple Java Mail):

Email email = EmailConverter.emlToEmail(emlFile);

email.getFromRecipient();
email.getSubject();
email.getPlainText();
email.getHTMLText();
email.getAttachments();
email.getEmbeddedImages();
email.getHeaders();
// etc. etc.

Also supports S/MIME encrypted emails (if you have the required certificates to decrypt the emails).

huangapple
  • 本文由 发表于 2020年9月29日 13:22:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/64113363.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定