寻找电子邮件附件的字节偏移量

huangapple go评论73阅读模式
英文:

Find byte offsets for e-mail attachments

问题

我得到一个要求,需要将电子邮件传递到一个需要读取附件的遗留系统。

对于电子邮件中的每个多部分部分,我需要提供附件开始的字节偏移量,以便遗留系统不需要知道如何解析电子邮件。

性能和内存使用是一个问题,所以解决方案不能将整个电子邮件加载到内存中。在我看来,这排除了使用javax.mail。

您会如何在Java中处理这个问题?

我最初的想法是使用mime4j,但是这个库不会保留字节偏移量甚至不会保留行号。
我调查了向mime4j提出PR以添加行号和字节偏移量跟踪的可能性。但是这并不是非常容易的,因为它是一个非常成熟的项目,并且在内部使用了大量的缓冲。

现在我在思考也许我走错了方向。因此,我非常希望能够得到任何关于如何用简单的方法解决这个问题的想法。

英文:

I got a requirement to deliver emails to a legacy system that needs to read the attachments.

For each part in a multipart email I need to provide the byte offset for where the attachment starts in the email, so the legacy system doesn't need to know how to parse emails.

Performance and memory usage is an issue, so the solution can't load the entire email into memory. And to my eyes that leaves out javax.mail.

How would you go about it in Java?

My first idea was to use mime4j, but the library does not keep of byte offset or even the line number.
I investigated making a PR to mime4j to add tracking of line numbers and byte offsets. But it is not very easy, since it is a very mature project and it uses lots of buffering internally.

Now I am thinking that maybe I am going about this the wrong way. So I would very much appreciate any ideas of how to solve this in a simple matter.

答案1

得分: 1

你只需发送字节偏移和完整电子邮件可能会遇到问题,因为电子邮件仍然可以进行Base64编码或打印引用编码。

你需要使用MimeStreamParser并提供自己的ContentHandler,然后重写body方法。然后,你可以直接发送BodyDescriptor和InputStream到旧系统。InputStream是“解码”的电子邮件(处理任何Content-Transfer-Encoding)。BodyDescriptor对于从你可能关心的部分标题中提取信息非常有用(MimeType和Charset是最有用的部分)。

这不会缓冲整个电子邮件,允许你仅流式传输正文部分。我不确定你是如何与旧系统通信的(通过网络还是作为内部子组件),但希望这能正常工作!

英文:

You're going to run into issues just sending the byte offsets and the full email, as emails still can be base64 encoded or printed-quoteable encoded.

You'll want to use a MimeStreamParser and give your own ContentHandler and override the body method. You can then directly send the BodyDescriptor and InputStream to the legacy system. The InputStream is the "decoded" email (IE handles any Content-Transfer-Encoding). The BodyDescriptor is useful to extract stuff from the headers of the part that you may care about (MimeType and Charset are the most useful ones).

This does not buffer the whole email, and allows you to stream out just the body parts. I'm not sure how you're communicating with the legacy system (via the network or if it's an inprocess subcomponent) but hopefully that works!

huangapple
  • 本文由 发表于 2020年10月1日 20:49:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/64155766.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定