Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > [script] dis/assembling mbox email

Reply
Thread Tools

[script] dis/assembling mbox email

 
 
William Park
Guest
Posts: n/a
 
      06-10-2004
Crossposted to Python group, because I think this is cleaner
approach.

Time to time, I need to
- extract main header/body from a MIME email,
- parse and extract multipart segments, recursively,
- walk through the email tree, and edit/delete/add stuffs
- regenerate new MIME email.

You can edit the file manually, but it's difficult to keep track of
where you are. So, I wrote shell scripts (included below my signature):
1. unmbox.sh -- to extract email components into directory tree
2. mbox.sh -- to generate email from directory tree
So, you can "walk" through MIME email by simply "walking" through
directory tree.

Analogy is 'tar' file. You extract files into directory tree, and you
create tarball from the directory tree. Or, if you are using Slackware,
analogy is 'explodepkg' and 'makepkg'.


Usage are
unmbox.sh dir < email
mbox.sh dir > email

'unmbox.sh' will extract email components into directory tree. Header
and body will be saved respectively as 'header' and 'body' files. If
it's MIME, then each multipart segment will be saved as 'xx[0-9][0-9]'
file, and it will in turn be decomposed recursively. In reverse,
'mbox.sh' recursively walks the directory tree, and assembles email
components into mbox-format.

Strictly speaking, MIME boundary pattern consists of any number of
[A-Za-z0-9 '()+_,./-]
not ending in space. And, boundary line in the message body consists of
\n--pattern\n
\n--pattern--\n
where 'pattern' is the boundary pattern assigned from Content-Type:
header.

For the sake of sanity,

1. The script recognizes only
boundary="..."
as MIME boundary parameter, ie. it must be double-quoted and no
spaces around '='.

2. Only lines consisting of '--pattern' or '--pattern--' are recognized
as boundary lines, because Formail puts blank line (if doesn't
already exist) at the top and bottom of email body, undoing '\n'
prefix/suffix anyways.

3. '.' needs to be escaped for Sed and Grep, and '()+.?' needs to be
escaped for Csplit and Egrep.


Use at your risk, and enjoy.
--
William Park, Open Geometry Consulting, <(E-Mail Removed)>
No, I will not fix your computer! I'll reformat your harddisk, though.


-----------------------------------------------------------------------

#! /bin/sh
# Usage: unmbox.sh dir < email

[ ! -d $1 ] && mkdir $1

cd $1
cat > input
formail -f -X '' < input > header # no blank lines
formail -I '' < input > body # blank lines at top/bottom

if grep -o "boundary=\"[A-Za-z0-9 '()+_,./-]*[A-Za-z0-9'()+_,./-]\"" header > boundary; then
. boundary
eboundary=`sed 's/[()+.?]/\\&/g' <<< "$boundary"`
csplit body "/^--$eboundary/" '{*}' # xx00, xx01, ...
for i in xx??; do
if head -1 $i | egrep "^--$eboundary\$" > /dev/null; then
sed '1d' $i | unmbox.sh $i.mbox
fi
done
else
rm boundary
fi

-----------------------------------------------------------------------

#! /bin/sh
# Usage: mbox.sh dir > email

cd $1
sed '/^$/ d' header # NO blank lines in header

if [ -f boundary ]; then
. boundary
echo
for i in xx??.mbox; do
echo "--$boundary"
mbox.sh $i
done
echo "--$boundary--"
echo
else
[ "`head -1 body`" ] && echo # blank line at top
cat body
[ "`tail -1 body`" ] && echo # blank line at bottom
: # dummy, so that return code is 0
fi

-----------------------------------------------------------------------
 
Reply With Quote
 
 
 
 
Alan Connor
Guest
Posts: n/a
 
      06-10-2004
On 10 Jun 2004 21:34:04 GMT, William Park <(E-Mail Removed)> wrote:
>



<snip>

> mbox.sh $i
> done
> echo "--$boundary--"
> echo
> else
> [ "`head -1 body`" ] && echo # blank line at top
> cat body
> [ "`tail -1 body`" ] && echo # blank line at bottom
> : # dummy, so that return code is 0
> fi
>
> -----------------------------------------------------------------------


Thanks, William. Tucked it away. Could come in REAL handy.


AC

--
http://angel.1jh.com./nanae/kooks/alanconnor.html
http://www.killfile.org./dungeon/why/connor.html
 
Reply With Quote
 
 
 
 
Skip Montanaro
Guest
Posts: n/a
 
      06-10-2004

William> Time to time, I need to
William> - extract main header/body from a MIME email,
William> - parse and extract multipart segments, recursively,
William> - walk through the email tree, and edit/delete/add stuffs
William> - regenerate new MIME email.

...

William> Usage are
William> unmbox.sh dir < email
William> mbox.sh dir > email

...

You might be interested in the splitndirs.py script which is part of the
Spambayes distribution. There is no joindirs.py script, but it's perhaps a
five-line script using the mboxutils.getmbox function (also part of
Spambayes).

Skip

 
Reply With Quote
 
those who know me have no need of my name
Guest
Posts: n/a
 
      06-11-2004
[fu-t set]

in comp.mail.misc i read:

> Crossposted to Python group, because I think this is cleaner
> approach.


but with not an ounce of python in your solution. and no followup-to.
sad.

--
a signature
 
Reply With Quote
 
William Park
Guest
Posts: n/a
 
      06-11-2004
In <comp.unix.shell> Skip Montanaro <(E-Mail Removed)> wrote:
>
> William> Time to time, I need to
> William> - extract main header/body from a MIME email,
> William> - parse and extract multipart segments, recursively,
> William> - walk through the email tree, and edit/delete/add stuffs
> William> - regenerate new MIME email.


> William> Usage are
> William> unmbox.sh dir < email
> William> mbox.sh dir > email


> You might be interested in the splitndirs.py script which is part of
> the Spambayes distribution. There is no joindirs.py script, but it's
> perhaps a five-line script using the mboxutils.getmbox function (also
> part of Spambayes).


I think splitndirs.py is Python's version of
formail -s
Of course, the inverse is simply to concatenate the files, and that
would one-liner.

--
William Park, Open Geometry Consulting, <(E-Mail Removed)>
No, I will not fix your computer! I'll reformat your harddisk, though.
 
Reply With Quote
 
William Park
Guest
Posts: n/a
 
      06-11-2004
In <comp.unix.shell> William Park <(E-Mail Removed)> wrote:
> Strictly speaking, MIME boundary pattern consists of any number of
> [ A-Za-z0-9'()+_,./-]

....
> if grep -o "boundary=\"[ A-Za-z0-9'()+_,./-]*[A-Za-z0-9'()+_,./-]\"" header > boundary; then


Typo:
Add '=' (equal sign) to the regexp above.

--
William Park, Open Geometry Consulting, <(E-Mail Removed)>
No, I will not fix your computer! I'll reformat your harddisk, though.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding email threads with mailbox.mbox Skye Python 1 09-24-2010 08:23 AM
mailbox.mbox not locking mbox properly tinnews@isbd.co.uk Python 3 08-27-2010 08:09 PM
mbox files steph Java 1 11-03-2004 06:22 PM
Breaking apart MBOX MJackson Perl 1 02-19-2004 08:50 PM
mbox mail format Rob B Computer Support 1 10-30-2003 10:00 PM



Advertisments