Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Removing base64 from mbox formatted file.

Reply
Thread Tools

Removing base64 from mbox formatted file.

 
 
me at
Guest
Posts: n/a
 
      12-26-2008

Hi,

I have a text based email application I use on my personal ISP
shell account. The file size of the mbox grows rather large
because of encoded attachments. I have been editing it by
searching for, [Bb][Aa][Ss][Ee]64, in vi/vim and marking and
deleting. It is very time consuming, and error prone.

I wonder if you might have a trick for the following example that
would delete all lines between the
base64 and the
--0__

Maybe a little perl script I could run against the file.

Thanks,

Vic


Content-transfer-encoding: base64

V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYG nKEFJeGrxUZy8dB8gmAXI/sPvH
ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1W iMmgkPHQRIrwgFuNV90A3doNKT
mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTV IA0yl5ciTovgLuBDKFUDE9aQcw
9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
9SA+QEyb

--0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DF CC19C6--































 
Reply With Quote
 
 
 
 
Tad J McClellan
Guest
Posts: n/a
 
      12-26-2008
me at <> wrote:

> delete all lines between the
> base64 and the
> --0__



> Content-transfer-encoding: base64
>
> V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYG nKEFJeGrxUZy8dB8gmAXI/sPvH
> ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1W iMmgkPHQRIrwgFuNV90A3doNKT
> mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTV IA0yl5ciTovgLuBDKFUDE9aQcw
> 9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
> 9SA+QEyb
>
> --0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DF CC19C6--



perl -n -i -e 'print unless /^Content-transfer-encoding: base64/ .. /^--0__' file.mbox


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      12-27-2008
On 2008-12-26 22:59, Tad J McClellan <> wrote:
> me at <> wrote:
>> delete all lines between the
>> base64 and the
>> --0__

>
>
>> Content-transfer-encoding: base64
>>
>> V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYG nKEFJeGrxUZy8dB8gmAXI/sPvH
>> ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1W iMmgkPHQRIrwgFuNV90A3doNKT
>> mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTV IA0yl5ciTovgLuBDKFUDE9aQcw
>> 9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
>> 9SA+QEyb
>>
>> --0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DF CC19C6--

>
>
> perl -n -i -e 'print unless /^Content-transfer-encoding: base64/ .. /^--0__' file.mbox


A perfect example of why it is sometimes not a good idea to answer the
questipn as asked. A message-part in a mime-encoded message does not
always start and end with "--0__". The delimiter has to be extracted
from the Content-Type header. Also base64 encoding is not limited to
"attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
normal ASCII text in base64.

hp
 
Reply With Quote
 
me at
Guest
Posts: n/a
 
      12-27-2008
Sat, 27 Dec 2008 12:09:47 +0100 Peter J. Holzer <hjp-> wrote:
| A perfect example of why it is sometimes not a good idea to answer the
| questipn as asked. A message-part in a mime-encoded message does not
| always start and end with "--0__". The delimiter has to be extracted
| from the Content-Type header. Also base64 encoding is not limited to
| "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
| normal ASCII text in base64.


Yup, you're right, I don't know what I am doing, but a lot of work
and experimenting shows not all encoding is the same. I have
changed my search, oh, I had to add the last / to make searches
work, and print to printf?

I have changed my search to

perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ .. /^------_=_/' mbox/inbox

Are the brackets [Cc] ok in perl, it seems to work? And I left out
if it is base64 or whatever, I found 7 different encodings,

I wonder do they always end ------_=_ ?
Not as easy for me to figure out.

Thanks,

Vic


 
Reply With Quote
 
Tim Greer
Guest
Posts: n/a
 
      12-27-2008
me at wrote:

> perlÂ*-nÂ*-iÂ*-eÂ*'printfÂ*unlessÂ*/^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/Â*..
> /^------_=_/'Â*mbox/inbox
>
> Are the brackets [Cc] ok in perl, it seems to work?


Yes, they serve the exact purpose you might be used to in some other
languages. I.e., [Cc] in a regular expression will either match an
upper case or lower case 'c'.
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!
 
Reply With Quote
 
Eric Pozharski
Guest
Posts: n/a
 
      12-28-2008
On 2008-12-27, me at <> wrote:
> Sat, 27 Dec 2008 12:09:47 +0100 Peter J. Holzer <hjp-> wrote:
>| A perfect example of why it is sometimes not a good idea to answer the
>| questipn as asked. A message-part in a mime-encoded message does not
>| always start and end with "--0__". The delimiter has to be extracted
>| from the Content-Type header. Also base64 encoding is not limited to
>| "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
>| normal ASCII text in base64.
>
>
> Yup, you're right, I don't know what I am doing, but a lot of work
> and experimenting shows not all encoding is the same. I have
> changed my search, oh, I had to add the last / to make searches
> work, and print to printf?
>
> I have changed my search to
>
> perl -n -i -e

'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ .. /^------_=_/' mbox/inbox
>
> Are the brackets [Cc] ok in perl, it seems to work? And I left out
> if it is base64 or whatever, I found 7 different encodings,


Yes, but you'll be better reading L<perlre>, then you'll find I<i>
modifier. Or even better -- give CPAN a chance, there're modules ready
to parse MIME.

> I wonder do they always end ------_=_ ?
> Not as easy for me to figure out.


RFC1341, have a nice reading.


--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      12-28-2008
me at wrote:

>I wonder do they always end ------_=_ ?
>Not as easy for me to figure out.


No. You have to extract the boundary from the MIME header and search for
that, in the specified usage. Or you can cheat.

For example, in one particular MIME mail in my mailbox I see the header

Content-Type: multipart/alternative;
boundary="b1_20eb16834381951dd528290ba6c2fd76"

The usage of this boundary I see just before the headers of a new
section as

--b1_20eb16834381951dd528290ba6c2fd76

(which as I said, you can use to cheat)

and after the last section as

--b1_20eb16834381951dd528290ba6c2fd76--

The "=" is just a very popular character in delimiters because its usage
is restricted in base64 encoding, so the risk of clashes with data is
very low to non-existent, especially in combination with "_".

See the MIME RFCs (RFC2045 and RFC 2046) for the details, in particular,
section 5.1 (Multipart Media Type) in RFC 2046.

http://tools.ietf.org/html/rfc2045
http://tools.ietf.org/html/rfc2046

Some extracts:

The Content-Type field for multipart entities requires one
parameter, "boundary". The boundary delimiter line is then
defined as a line consisting entirely of two hyphen characters
("-", decimal value 45) followed by the boundary parameter
value from the Content-Type header field, optional linear
whitespace, and a terminating CRLF.

The body must then contain one or more body parts, each preceded
by a boundary delimiter line, and the last one followed by a
closing boundary delimiter line.

The boundary delimiter line following the last body part is a
distinguished delimiter that indicates that no further body
parts will follow. Such a delimiter line is identical to the
previous delimiter lines, with the addition of two more hyphens
after the boundary parameter value.

--
Bart.
 
Reply With Quote
 
me at
Guest
Posts: n/a
 
      12-28-2008

Hi,

seems to work.

perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding: [Bb][Aa][Ss][Ee]64/ .. /--$/' filename

Thanks for all of the tips.

--
Vic




 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      12-28-2008
me at wrote:

>
>seems to work.
>
> perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding: [Bb][Aa][Ss][Ee]64/ .. /--$/' filename
>
>Thanks for all of the tips.
>


You can just make the first regex case insensitive, and for the latter,
I think maybe you'd better sync on leading hyphens instead of trailing
hyphens, because now you're throwing away *everything* starting from the
base64 encoded attachment:

Oh, and you should use print, not printf, or you'll get *big* trouble if
the line contains percent signs.

perl -n -i -e 'print unless /^Content-Transfer-Encoding: base64/i ..
/^--/' filename

--
Bart.
 
Reply With Quote
 
me at
Guest
Posts: n/a
 
      12-29-2008
Sun, 28 Dec 2008 23:29:34 +0100 Bart Lateur <> wrote:
| You can just make the first regex case insensitive, and for the latter,
| I think maybe you'd better sync on leading hyphens instead of trailing
| hyphens, because now you're throwing away *everything* starting from the
| base64 encoded attachment:
|
| Oh, and you should use print, not printf, or you'll get *big* trouble if
| the line contains percent signs.
|
| perl -n -i -e 'print unless /^Content-Transfer-Encoding: base64/i ..
| /^--/' filename


Done,
Thanks,



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
mailbox.mbox not locking mbox properly tinnews@isbd.co.uk Python 3 08-27-2010 08:09 PM
mbox files steph Java 1 11-03-2004 06:22 PM
Breaking apart MBOX MJackson Perl 1 02-19-2004 08:50 PM
mbox despamming script Paul Rubin Python 1 11-27-2003 12:21 PM
mbox mail format Rob B Computer Support 1 10-30-2003 10:00 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57