Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Building email threads from unix mailboxes

Reply
Thread Tools

Building email threads from unix mailboxes

 
 
Jed Parsons
Guest
Posts: n/a
 
      10-18-2004
What headers to I have to know about to build thread trees from Unix
mailboxes?

Is it enough to get the In-Reply-To header for each message and build a
dictionary of { Message-ID: message } pairs? Or is it more complicated
than that?

If there isn't already a module to do this (and apologies if there is
one and I don't know about it), are the current tools of choice the
'email' and 'mailbox' modules? (And I guess I'd want to use the mime
decoding tools in 'email' to deal with messages that come with
attachments or html or other stuff.)

Thanks for any tips,

Jed

 
Reply With Quote
 
 
 
 
Josiah Carlson
Guest
Posts: n/a
 
      10-18-2004
> Is it enough to get the In-Reply-To header for each message and build a
> dictionary of { Message-ID: message } pairs? Or is it more complicated
> than that?


To be RFC 2822 compliant, In-Reply-To and References are sufficient.
Other clients may add more headers, and not all clients are RFC 2822
compliant.

- Josiah

 
Reply With Quote
 
 
 
 
Jed Parsons
Guest
Posts: n/a
 
      10-18-2004

Thanks.

Is the References header a running list of all the In-Reply-To headers
so far in the thread?

 
Reply With Quote
 
Erik Max Francis
Guest
Posts: n/a
 
      10-18-2004
Jed Parsons wrote:

> Is the References header a running list of all the In-Reply-To headers
> so far in the thread?


It depends on the service. Some only keep a few of the last references,
some only one, some retain the full list from the very beginning (at
least as far as the RFC will allow).

Probably if you wanted to handle robust threading, you'd want to go by
In-Reply-To and References, backtracking manually (rather than relying
on any given References list to be complete), and then, for systems like
mail-to-news gateways which may break the In-Reply-To/References chain,
group by similar subjects posted around the same time.

--
__ Erik Max Francis && http://www.velocityreviews.com/forums/(E-Mail Removed) && http://www.alcyone.com/max/
/ \ San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
\__/ Love is the triumph of imagination over intelligence.
-- H.L. Mencken
 
Reply With Quote
 
Mark Rowe
Guest
Posts: n/a
 
      10-19-2004

On Oct 19, 2004, at 11:24 AM, Jed Parsons wrote:

> What headers to I have to know about to build thread trees from Unix
> mailboxes?
>
> Is it enough to get the In-Reply-To header for each message and build a
> dictionary of { Message-ID: message } pairs? Or is it more complicated
> than that?


<http://www.jwz.org/doc/threading.html> has a good write-up about the
threading algorithm used by Netscape Mail and News 2.0 and 3.0, and
Grendel (<http://www.mozilla.org/projects/grendel/>). Jamie Zawinski
was responsible for the design of Netscape Mail and News 2.0 and 3.0.

> If there isn't already a module to do this (and apologies if there is
> one and I don't know about it), are the current tools of choice the
> 'email' and 'mailbox' modules? (And I guess I'd want to use the mime
> decoding tools in 'email' to deal with messages that come with
> attachments or html or other stuff.)


A.M. Kuchling has made a Python implementation of JWZ's algorithm
available at <http://www.amk.ca/python/code/jwz>.

> Thanks for any tips,
>
> Jed


Regards,

Mark Rowe
<http://bdash.net.nz/>

 
Reply With Quote
 
Jed Parsons
Guest
Posts: n/a
 
      10-19-2004
Awesome! Thanks so much.

j

 
Reply With Quote
 
Matthew Dixon Cowles
Guest
Posts: n/a
 
      10-19-2004
In article <(E-Mail Removed) .com>,
"Jed Parsons" <(E-Mail Removed)> wrote:

[Apologies if this arrives twice; my news server appears to have dropped
my initial post.]

The best algorithm for threading email messages I've seen is Jamie
Zawinski's. It's at:

http://www.jwz.org/doc/threading.html

I have an implementation in Python that's linked from:

http://www.mondoinfo.com/blog/C18226...988/index.html

Regards,
Matt
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting Unix mailboxes harold barker Python 0 08-04-2007 12:20 AM
Sorting Unix mailboxes sfeil@io.com Python 3 09-16-2005 04:00 AM
Where are the mailboxes? Stuart Firefox 6 01-15-2004 09:29 PM
ANN: mailbox_reader 1.0.3 -- Python module to read UNIX mailboxes sequentially. Grzegorz Adam Hankiewicz Python 0 07-26-2003 10:25 PM
Re: mailbox_reader 1.0.2 -- Python module to read UNIX mailboxes sequentially. Grzegorz Adam Hankiewicz Python 3 07-13-2003 08:48 PM



Advertisments