Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Computing > Firefox > understand Mozilla Thunderbird files...

Reply
Thread Tools

understand Mozilla Thunderbird files...

 
 
Joh
Guest
Posts: n/a
 
      11-27-2004
hello,

i would like to write a python script to remove duplicates emails from
mozilla thunderbird inbox files... or a least dumping theses email
headers in a text file.

please where can i find some urls to understand how theses files are
coded ?

thanks
 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?J=FCrgen_Harter?=
Guest
Posts: n/a
 
      11-27-2004
Joh wrote:

> i would like to write a python script to remove duplicates emails from
> mozilla thunderbird inbox files... or a least dumping theses email
> headers in a text file.
>
> please where can i find some urls to understand how theses files are
> coded ?


Search for "mbox format"!

J
 
Reply With Quote
 
 
 
 
Ralph Fox
Guest
Posts: n/a
 
      11-28-2004
On 27 Nov 2004 09:21:05 -0800, in message
<> , Joh wrote:

> i would like to write a python script to remove duplicates emails from
> mozilla thunderbird inbox files...


The AWK script below will do this.


> or a least dumping theses email
> headers in a text file.
>
> please where can i find some urls to understand how theses files are
> coded ?



1. Each email folder in Mozilla or Mozilla Thunderbird corresponds to
two files. For example, for the "Inbox" folder there are two
files

Inbox
Inbox.msf

1.1 The file "Inbox" (no extension) is in mbox file format.

A web search will turn up many descriptions of the mbox file
format.

Also see note 2 below

1.2 The file "Inbox.msf" is a summary file. It will be
re-created if it does not exist.


2. Mozilla adds two proprietary headers to received email messages,
hold message status flags (e.g. read/unread, flagged, marked
as deleted, etc.). If you look at the messages in Mozilla
Thunderbird's mbox files, you will see these two extra headers
(with possibly different values).

| X-Mozilla-Status: 8001
| X-Mozilla-Status2: 00000000


3. Below is AWK script to remove duplicate emails from a mbox format
file. Duplicates are detected by comparing the message-IDs.

You can use this script on Mozilla Thunderbird's mbox format
mail files. For example, with the "Inbox" email folder...

3.1 First compact the "Inbox" email folder in Mozilla.
3.2 Run the file "Inbox" through this AWK script to generate
an output file (say) "Inbox_dedup.tmp", and then replace the
file "Inbox" with "Inbox_dedup.tmp".
3.3 Delete the file "Inbox.msf".


4. When you delete a message from a Mozilla Thunderbird email folder,
Mozilla Thunderbird does not immediately remove the message from
the mbox file. Instead, Mozilla Thunderbird sets a status flag
to indicate that the message is marked as deleted. The deleted
message is removed when you "Compact" the folder in Mozilla
Thunderbird.

The AWK script below will not notice that a message has been
marked as deleted (it does not check the proprietary Mozilla flags).

So if you don't compact the email folder first, then you run
the risk of the following situation happening.

Risk
• There are two copies of the same message (a duplicate).
• The first copy in the file is marked for delete; the second is not.
• The AWK script keeps the first copy, sees the second is a duplicate
and removes it.


5. The script dedup_mbox.awk

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
#!/usr/bin/awk -f
#
# ---------------------------------------------------------------
#
# Removes duplicate messages from a mbox file.
# Duplicates are identified by message-id.
#
# ---------------------------------------------------------------


BEGIN {

# this is a state-driven program
# states are "INIT", "SCAN", "SKIP" and "COPY"

state = "INIT" ;

backbuffer = "" ;

}

/^From / {

# separator line indicating the start of a message in the file

if ( state == "SCAN" )
{
# We only get here when we encounter a new message in the file,
# and the previous message in the file had no body and no
# message-id header.

# The code below makes a policy decision to treat any message
# with no message-id header as not a duplicate.

if ( backbuffer != "" )
printf( "%s", backbuffer ) ;
backbuffer = "" ;
}

# set state to scanning for a message-id header,
# saving text to print if this is found to be a new message.

state = "SCAN" ;
backbuffer = "" ;
}

/^Message-ID: / {

# if we are in SCAN state, then this is the message-id header we are looking for.

if ( state == "SCAN" )
{
# in header and not yet seen a message-id header

message_id = $2 ;

if ( have_seen[ message_id ] == "YES" )
{
# this is a duplicate message
#printf "Duplicate: %s\n", message_id | "cat 1>&2;" ;

backbuffer = "" ;
state = "SKIP" ;
}
else
{
# not a duplicate.
#printf "Original: %s\n", message_id | "cat 1>&2;" ;

have_seen[ message_id ] = "YES" ;

# print from separator and all header lines before this.

if ( backbuffer != "" )
printf( "%s", backbuffer ) ;
backbuffer = "" ;
state = "COPY" ;
}
}
}

/^$/ {

# empty line, possibly the boundary between headers and body

if ( state == "SCAN" )
{
# We only get here when we reach the end of the header
# and there was no message-id in header.

# The code below makes a policy decision to treat any message
# with no message-id header as not a duplicate.

if ( backbuffer != "" )
printf( "%s", backbuffer ) ;
backbuffer = "" ;
state = "COPY" ;
}
}

{
# any line

if ( state == "SCAN" )
{
backbuffer = backbuffer $0 "\n" ;
}
else
if ( state == "SKIP" )
{
backbuffer = "" ;
}
else
{
printf( "%s%s\n", backbuffer, $0 ) ;
backbuffer = "" ;
}
}

END {

if ( state == "SCAN" )
{
# We only get here when we reach the end of the file,
# and the last message in the file had no body and no
# message-id header.

# The code below makes a policy decision to treat any message
# with no message-id header as not a duplicate.

if ( backbuffer != "" )
printf( "%s", backbuffer ) ;
backbuffer = "" ;
}

state="INIT" ;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~


--
Cheers,
Ralph

"Curiosity skilled the cat."

 
Reply With Quote
 
smalolepszy
Guest
Posts: n/a
 
      12-03-2004
I have 2 question;

1. Have you any specification of msf file. I saw in this file subject,
from etc?

2. Maybe do you know where I can find Thunderbird folders. Is there
any information about their location in windows registry? And if users
have many profiles which is his main profil?

Thanks.
 
Reply With Quote
 
Joh
Guest
Posts: n/a
 
      12-03-2004
exactly what i need

thank to you both i had understand format and have a ready to use tool
 
Reply With Quote
 
Moz Champion
Guest
Posts: n/a
 
      12-07-2004
smalolepszy wrote:

> I have 2 question;
>
> 1. Have you any specification of msf file. I saw in this file subject,
> from etc?
>
> 2. Maybe do you know where I can find Thunderbird folders. Is there
> any information about their location in windows registry? And if users
> have many profiles which is his main profil?
>
> Thanks.


A .msf file is a Mail Summary file
they are created if required when Mozilla accesses a folder/file that
doesnt have one extant. Since Moz does this automatically, the general
response to any problems with the files is to simply delete them with
Mozilla off, as the product will then rewrite and recreate them when the
user next accesses the file/folder. They can be indentified in the Mail
folder by the suffix .msf of course. i.e. for the Inbox there will be a
file called Inbox, a file called Inbox.msf (which is the summary/index)
and perhaps a Inbox.sbd (if there are sub folders in the Inbox itself)

The precise locaton on your profile folder depends on the version you
are running as well as your operating system.
From current version of Thunderbird use the HELP menu and view the
release notes, the profile location for specific systems is listed there.

If you have multiple profiles, the user should know which one he/she is
using, because they chose it on startup

--
Mozilla Champion
UFAQ - http://www.UFAQ.org
Mozilla Champions - http://mozillachampions.mozdev.org
Mozilla Manual - http://mozmanual.mozdev.org/
 
Reply With Quote
 
xiaotam@gmail.com
Guest
Posts: n/a
 
      12-11-2004
Thanks for that script, it came in very handy.
I suggest changing the line

/^Message-ID: / {

to

/^Message-I[Dd]: / {

because it seems sometimes it's 'Id', not 'ID'.

Cheers,
Nick



Ralph Fox wrote:
> On 27 Nov 2004 09:21:05 -0800, in message
> <> , Joh wrote:
>
> > i would like to write a python script to remove duplicates emails

from
> > mozilla thunderbird inbox files...

>
> The AWK script below will do this.
>
>
> > or a least dumping theses email
> > headers in a text file.
> >
> > please where can i find some urls to understand how theses files

are
> > coded ?

>
>
> 1. Each email folder in Mozilla or Mozilla Thunderbird corresponds

to
> two files. For example, for the "Inbox" folder there are two
> files
>
> Inbox
> Inbox.msf
>
> 1.1 The file "Inbox" (no extension) is in mbox file format.
>
> A web search will turn up many descriptions of the mbox file


> format.
>
> Also see note 2 below
>
> 1.2 The file "Inbox.msf" is a summary file. It will be
> re-created if it does not exist.
>
>
> 2. Mozilla adds two proprietary headers to received email messages,
> hold message status flags (e.g. read/unread, flagged, marked
> as deleted, etc.). If you look at the messages in Mozilla
> Thunderbird's mbox files, you will see these two extra headers
> (with possibly different values).
>
> | X-Mozilla-Status: 8001
> | X-Mozilla-Status2: 00000000
>
>
> 3. Below is AWK script to remove duplicate emails from a mbox format


> file. Duplicates are detected by comparing the message-IDs.
>
> You can use this script on Mozilla Thunderbird's mbox format
> mail files. For example, with the "Inbox" email folder...
>
> 3.1 First compact the "Inbox" email folder in Mozilla.
> 3.2 Run the file "Inbox" through this AWK script to generate
> an output file (say) "Inbox_dedup.tmp", and then replace the
> file "Inbox" with "Inbox_dedup.tmp".
> 3.3 Delete the file "Inbox.msf".
>
>
> 4. When you delete a message from a Mozilla Thunderbird email

folder,
> Mozilla Thunderbird does not immediately remove the message from
> the mbox file. Instead, Mozilla Thunderbird sets a status flag
> to indicate that the message is marked as deleted. The deleted
> message is removed when you "Compact" the folder in Mozilla
> Thunderbird.
>
> The AWK script below will not notice that a message has been
> marked as deleted (it does not check the proprietary Mozilla

flags).
>
> So if you don't compact the email folder first, then you run
> the risk of the following situation happening.
>
> Risk
> · There are two copies of the same message (a duplicate).
> · The first copy in the file is marked for delete; the second

is not.
> · The AWK script keeps the first copy, sees the second is a

duplicate
> and removes it.
>
>
> 5. The script dedup_mbox.awk
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
> #!/usr/bin/awk -f
> #
> # ---------------------------------------------------------------
> #
> # Removes duplicate messages from a mbox file.
> # Duplicates are identified by message-id.
> #
> # ---------------------------------------------------------------
>
>
> BEGIN {
>
> # this is a state-driven program
> # states are "INIT", "SCAN", "SKIP" and "COPY"
>
> state = "INIT" ;
>
> backbuffer = "" ;
>
> }
>
> /^From / {
>
> # separator line indicating the start of a message in the file
>
> if ( state == "SCAN" )
> {
> # We only get here when we encounter a new message in the

file,
> # and the previous message in the file had no body and no
> # message-id header.
>
> # The code below makes a policy decision to treat any message
> # with no message-id header as not a duplicate.
>
> if ( backbuffer != "" )
> printf( "%s", backbuffer ) ;
> backbuffer = "" ;
> }
>
> # set state to scanning for a message-id header,
> # saving text to print if this is found to be a new message.
>
> state = "SCAN" ;
> backbuffer = "" ;
> }
>
> /^Message-ID: / {
>
> # if we are in SCAN state, then this is the message-id header we

are looking for.
>
> if ( state == "SCAN" )
> {
> # in header and not yet seen a message-id header
>
> message_id = $2 ;
>
> if ( have_seen[ message_id ] == "YES" )
> {
> # this is a duplicate message
> #printf "Duplicate: %s\n", message_id | "cat 1>&2;" ;
>
> backbuffer = "" ;
> state = "SKIP" ;
> }
> else
> {
> # not a duplicate.
> #printf "Original: %s\n", message_id | "cat 1>&2;" ;
>
> have_seen[ message_id ] = "YES" ;
>
> # print from separator and all header lines before this.
>
> if ( backbuffer != "" )
> printf( "%s", backbuffer ) ;
> backbuffer = "" ;
> state = "COPY" ;
> }
> }
> }
>
> /^$/ {
>
> # empty line, possibly the boundary between headers and body
>
> if ( state == "SCAN" )
> {
> # We only get here when we reach the end of the header
> # and there was no message-id in header.
>
> # The code below makes a policy decision to treat any message
> # with no message-id header as not a duplicate.
>
> if ( backbuffer != "" )
> printf( "%s", backbuffer ) ;
> backbuffer = "" ;
> state = "COPY" ;
> }
> }
>
> {
> # any line
>
> if ( state == "SCAN" )
> {
> backbuffer = backbuffer $0 "\n" ;
> }
> else
> if ( state == "SKIP" )
> {
> backbuffer = "" ;
> }
> else
> {
> printf( "%s%s\n", backbuffer, $0 ) ;
> backbuffer = "" ;
> }
> }
>
> END {
>
> if ( state == "SCAN" )
> {
> # We only get here when we reach the end of the file,
> # and the last message in the file had no body and no
> # message-id header.
>
> # The code below makes a policy decision to treat any message
> # with no message-id header as not a duplicate.
>
> if ( backbuffer != "" )
> printf( "%s", backbuffer ) ;
> backbuffer = "" ;
> }
>
> state="INIT" ;
> }
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
>
>
> --
> Cheers,
> Ralph
>
> "Curiosity skilled the cat."


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mozilla/Netscape - don't understand error Neil Javascript 2 09-28-2006 02:13 PM
Read all of this to understand how it works. then check around on otherRead all of this to understand how it works. then check around on other thelisa martin Computer Support 2 08-18-2005 06:40 AM
Thunderbird 1.0 install over Thunderbird 0.8 mapmaker Firefox 4 03-05-2005 12:16 AM
My CUSTOM Versions Of Mozilla Fiorefox & Mozilla Thunderbird Norvin Adams III Firefox 6 07-13-2004 03:26 PM
Can you help me understand this javascript example that runs on IE and Opera and crashes under Mozilla? Jerry Asher Javascript 0 07-15-2003 06:12 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57