Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Re: Email parsing library

Reply
Thread Tools

Re: Email parsing library

 
 
Rainer Frey
Guest
Posts: n/a
 
      05-27-2009
Spud wrote:

> I need some code that will split out and recognize common things in an
> RFC 2822 email: headers, body text, quoted sections, signature blocks,
> mime attachments.


[...]

> (No, Javamail doesn't cut it).


Why?

Rainer
 
Reply With Quote
 
 
 
 
Rainer Frey
Guest
Posts: n/a
 
      06-02-2009
Spud wrote:

> Rainer Frey wrote:
>> Spud wrote:
>>
>>> I need some code that will split out and recognize common things in an
>>> RFC 2822 email: headers, body text, quoted sections, signature blocks,
>>> mime attachments.

>>
>> [...]

>
> Because Javamail doesn't distinguish quoted sections or signature blocks
> within the body of the email. It's not designed for deep analysis of the
> content of an email.


I overlooked the quote sections and signature block requirement. But as
there were no alternative suggestions by anyone, I think JavaMail is a good
base to start. It can read the headers of course, but it also does a
content analysis of the mail body as deep as mime parts go. With this you
can extract attachments, and distinguish plain text and HTML mail parts.

Quoted sections and signatures are part of one single plain text part (which
is the only part in a plain text mail w/o any attachments). JavaMail
defines an API (built on Java Activation Framework) for content handlers
for a certain content type. You could implement and register a content
handler for text/plain that creates an object representation of the quote
section, signatures and actual text instead of the default string
representation. I don't know any existing code for this, but the
third-party download page for JavaMail lists several desktop and web mail
clients, some open-source, that might contain s.th. like this. Search
through http://java.sun.com/products/javamail/Third_Party.html,
unfortunately not all links exist anymore.

Then there is the quite mature but discontinued desktop mail client
ColumbaMail at http://columbamail.org, which seems to contain code at least
to mark quoted sections.

Rainer
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Email parsing library Roedy Green Java 1 05-27-2009 09:09 AM
Re: Email parsing library Mike Schilling Java 1 05-27-2009 01:13 AM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments