Spud wrote:
> I need some code that will split out and recognize common things in an
> RFC 2822 email: headers, body text, quoted sections, signature blocks,
> mime attachments.
>
> It's not difficult to write a simple header/body parser, but
> recognizing when text is quoted from a previous email or text is part
> of a signature block is non-trivial. It also turns out to be
> deceptively difficult to split out email addresses on the to/from
> lines accurately. A good solution will likely involve some heuristics
> and possibly some machine learning to get decent accuracy.
>
> Does anyone know of such a library?
Mail clients know how to do these things (more or less well). Are there any
open source Java mail clients?
|