Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > good email parser ??

Reply
Thread Tools

good email parser ??

 
 
Jack
Guest
Posts: n/a
 
      02-07-2009
Hi I havent had any luck with the CPAN email modules, I just want to
parse multipart and mime and base64, with all the varieties of email
files out there, these modules just dont work... does anyone know a
free or low cost command line driven email client or parser that can
do the job.

Thank you,

Jack
 
Reply With Quote
 
 
 
 
rabbits77
Guest
Posts: n/a
 
      02-08-2009
Jack wrote:
> Hi I havent had any luck with the CPAN email modules, I just want to
> parse multipart and mime and base64, with all the varieties of email
> files out there, these modules just dont work... does anyone know a
> free or low cost command line driven email client or parser that can
> do the job.
>
> Thank you,

I have done some work parsing email in the(fairly distant) past.
Email really isn't that varied!
In order for email to work at all, in fact, it needs to be pretty
predictable!
I bet that you could do this yourself.
Where are your sticking points?
If I understand your question, do you just want to remove all
email attachments?
 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      02-08-2009
On 2009-02-07 23:59, Jack <(E-Mail Removed)> wrote:
> Hi I havent had any luck with the CPAN email modules, I just want to
> parse multipart and mime and base64, with all the varieties of email
> files out there, these modules just dont work...


MIME:arser works for me. It is a bit slow and tends to use ridiculuous
amounts of memory if you want to avoid temporary files, but I have yet
to find a (syntactically correct) email which can't parse.

hp
 
Reply With Quote
 
Jack
Guest
Posts: n/a
 
      02-09-2009
On Feb 8, 12:08*pm, "Peter J. Holzer" <(E-Mail Removed)> wrote:
> On 2009-02-07 23:59, Jack <(E-Mail Removed)> wrote:
>
> > Hi I havent had any luck with the CPAN email modules, I just want to
> > parse multipart and mime and base64, with all the varieties of email
> > files out there, these modules just dont work...

>
> MIME:arser works for me. It is a bit slow and tends to use ridiculuous
> amounts of memory if you want to avoid temporary files, but I have yet
> to find a (syntactically correct) email which can't parse.
>
> * * * * hp


Thanks Peter for the posting.. can you provide some guidance then.. I
tried the below code and figured the skeleton would report the base64
image attachments in a MIME message, but isnt picking it up. I need
to be able to deal with text body, base64 body, and image attachments,
and want to parse them out correctly. I can do the base64 decoding,
etc. - how do I accomplish this with MIME:arser ??

Code:
use MIME:arser;

if (@ARGV[0] eq undef) {
$filename1="no dest filename" ;
} else {
$filename1=@ARGV[0];
}

### Create a new parser object:
my $parser = new MIME:arser;

### Tell it where to put things:
$parser->output_under("e:\\tmp");

### Parse an input filehandle:
$entity = $parser->parse($filename1);

### Congratulations: you now have a (possibly multipart) MIME
entity!
$entity->dump_skeleton;

####HERES THE OUTPUT
Content-type: text/plain
Effective-type: text/plain
Content-encoding: 7bit
Body-location: (IN CORE)
Body-size: 0
--

####
It appears to not picking up this from the email itself -
Content-Type: image/jpeg; name="cardamage1.jpg"
Content-Disposition: attachment; filename="cardamage1.jpg"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_fqzhlhly0


###
Also I tried to build my own parser based on the "boundary" definition
but as you can see from the below example, its not clear why I have >
1 boundary !

Date: Sun, 24 Aug 2008 06:46:48 -0700
From: "Ben Brewster" <(E-Mail Removed)>
To: http://www.velocityreviews.com/forums/(E-Mail Removed)
Subject: car for sale two images
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_13503_152406.1219585608169"

------=_Part_13503_152406.1219585608169
Content-Type: multipart/alternative;
boundary="----=_Part_13504_19292996.1219585608169"

------=_Part_13504_19292996.1219585608169
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi


------=_Part_13504_19292996.1219585608169
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div dir="ltr"></div>

------=_Part_13504_19292996.1219585608169--

------=_Part_13503_152406.1219585608169
Content-Type: image/jpeg; name=masertione.jpg
Content-Transfer-Encoding: base64
X-Attachment-Id: f_fk9pr8s20
Content-Disposition: attachment; filename=masertione.jpg
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      02-10-2009
Jack wrote:
> On Feb 8, 12:08 pm, "Peter J. Holzer" <(E-Mail Removed)> wrote:
>> On 2009-02-07 23:59, Jack <(E-Mail Removed)> wrote:
>>
>>> Hi I havent had any luck with the CPAN email modules, I just want to
>>> parse multipart and mime and base64, with all the varieties of email
>>> files out there, these modules just dont work...

>> MIME:arser works for me. It is a bit slow and tends to use ridiculuous
>> amounts of memory if you want to avoid temporary files, but I have yet
>> to find a (syntactically correct) email which can't parse.

>
> Thanks Peter for the posting.. can you provide some guidance then.. I
> tried the below code and figured the skeleton would report the base64
> image attachments in a MIME message, but isnt picking it up. I need
> to be able to deal with text body, base64 body, and image attachments,
> and want to parse them out correctly. I can do the base64 decoding,
> etc. - how do I accomplish this with MIME:arser ??
>
> Code:


use warnings;
use strict;

> use MIME:arser;
>
> if (@ARGV[0] eq undef) {


You cannot use undef in a comparison. Perl will just convert it
internally to a numeric, or in this case, a string representation of
"false", 0 or '' respectively. You shouldn't use a list in scalar
context. If you had warnings enabled then perl would have warned about
this.

if ( not defined $ARGV[ 0 ] ) {

> $filename1="no dest filename" ;
> } else {
> $filename1=@ARGV[0];


$filename1 = $ARGV[ 0 ];

> }


Or if you have Perl version 5.10 installed you could write that as:

my $filename1 = $ARGV[ 0 ] // 'no dest filename';

For older perl's that would be:

my $filename1 = defined $ARGV[ 0 ] ? $ARGV[ 0 ] : 'no dest filename';



John
--
Those people who think they know everything are a great
annoyance to those of us who do. -- Isaac Asimov
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      02-10-2009
>>>>> "JWK" == John W Krahn <(E-Mail Removed)> writes:

>> use MIME:arser;
>> if (@ARGV[0] eq undef) {


JWK> You cannot use undef in a comparison. Perl will just convert it
JWK> internally to a numeric, or in this case, a string representation of
JWK> "false", 0 or '' respectively. You shouldn't use a list in scalar
JWK> context. If you had warnings enabled then perl would have warned
JWK> about this.

couple of nits to pick. undef is coerced to '' with eq since it is
string context. and @ARGV[0] is a slice but it will return a single
value here. sure it is incorrect but it will work.

JWK> if ( not defined $ARGV[ 0 ] ) {

>> $filename1="no dest filename" ;
>> } else {
>> $filename1=@ARGV[0];


JWK> $filename1 = $ARGV[ 0 ];

>> }


JWK> Or if you have Perl version 5.10 installed you could write that as:

JWK> my $filename1 = $ARGV[ 0 ] // 'no dest filename';

JWK> For older perl's that would be:

JWK> my $filename1 = defined $ARGV[ 0 ] ? $ARGV[ 0 ] : 'no dest filename';

you should know better. the best way to check for elements in an array
is checking its count. since he wants only one arg this should do fine:

@ARGV or die "missing file name argument" ;
my $filename = shift ;

and to the OP, you can never have an undef in @ARGV unless you put it
there yourself. @ARGV is passed in from the exec call (the shell does
this for command line programs) and shell doesn't know about undef.

uri

--
Uri Guttman ------ (E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
Hans Mulder
Guest
Posts: n/a
 
      02-10-2009
Jack wrote:
> On Feb 8, 12:08 pm, "Peter J. Holzer" <(E-Mail Removed)> wrote:
>> On 2009-02-07 23:59, Jack <(E-Mail Removed)> wrote:
>>
>>> Hi I havent had any luck with the CPAN email modules, I just want to
>>> parse multipart and mime and base64, with all the varieties of email
>>> files out there, these modules just dont work...

>> MIME:arser works for me. It is a bit slow and tends to use ridiculuous
>> amounts of memory if you want to avoid temporary files, but I have yet
>> to find a (syntactically correct) email which can't parse.
>>
>> hp

>
> Thanks Peter for the posting.. can you provide some guidance then.. I
> tried the below code and figured the skeleton would report the base64
> image attachments in a MIME message, but isnt picking it up.


The parse() method takes a file handle argument. So you'll have to
open the file yourself and pass the resulting handle to parse():

use warnings;
use strict;

use MIME:arser;

my $dir = "e:\\tmp";

if (not -d $dir) {
mkdir $dir or die "Can't create directory $dir: $!\n";
}

my $filename1 = $ARGV[0] || "no input filename";

### Create a new parser object:
my $parser = new MIME:arser;

### Tell it where to put things:
$parser->output_under($dir);

### Open the file:
open my $fh, '<', $filename1 or die "Can't read $filename1: $!\n";

### Parse an input filehandle:
my $entity = $parser->parse($fh);

### Congratulations: you now have a (possibly multipart) MIME entity!
$entity->dump_skeleton;
__END__

This prints:

Content-type: multipart/mixed
Effective-type: multipart/mixed
Body-file: NONE
Subject: car for sale two images
Num-parts: 2
--
Content-type: multipart/alternative
Effective-type: multipart/alternative
Body-file: NONE
Num-parts: 2
--
Content-type: text/plain
Effective-type: text/plain
Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-1.txt
--
Content-type: text/html
Effective-type: text/html
Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-2.html
--
Content-type: image/jpeg
Effective-type: image/jpeg
Body-file: e:\tmp/msg-1234304022-16083-0/masertione.jpg
Recommended-filename: masertione.jpg
--

> I need
> to be able to deal with text body, base64 body, and image attachments,
> and want to parse them out correctly. I can do the base64 decoding,
> etc. -


MIME:arser will do the base64 decoding for you.

> how do I accomplish this with MIME:arser ??


Read the documentation carefully:

parse INSTREAM
Instance method. Takes a MIME-stream and splits it into its compo-
nent entities.

The INSTREAM can be given as a readable FileHandle, an IO::File, a
globref filehandle (like "\*STDIN"), or as any blessed object con-
forming to the IO:: interface (which minimally implements getline()
and read()).

It does not mention the possibility of passing a filename and parse()
opening it on your behalf. This suggest that this feature does not
exist in this version of MIME:arser.

Hope this helps,

-- HansM

 
Reply With Quote
 
Jack
Guest
Posts: n/a
 
      02-21-2009
On Feb 10, 2:38*pm, Hans Mulder <(E-Mail Removed)> wrote:
> Jack wrote:
> > On Feb 8, 12:08 pm, "Peter J. Holzer" <(E-Mail Removed)> wrote:
> >> On 2009-02-07 23:59, Jack <(E-Mail Removed)> wrote:

>
> >>> Hi I havent had any luck with the CPAN email modules, I just want to
> >>> parse multipart andmimeand base64, with all the varieties of email
> >>> files out there, these modules just dont work...
> >>MIME:arser works for me. It is a bit slow and tends to use ridiculuous
> >> amounts of memory if you want to avoid temporary files, but I have yet
> >> to find a (syntactically correct) email which can't parse.

>
> >> * * * * hp

>
> > Thanks Peter for the posting.. can you provide some guidance then.. I
> > tried the below code and figured the skeleton would report the base64
> > image attachments in aMIMEmessage, but isnt picking it up.

>
> The parse() method takes a file handle argument. *So you'll have to
> open the file yourself and pass the resulting handle to parse():
>
> use warnings;
> use strict;
>
> useMIME:arser;
>
> my $dir = "e:\\tmp";
>
> if (not -d $dir) {
> * * *mkdir $dir or die "Can't create directory $dir: $!\n";
>
> }
>
> my $filename1 = $ARGV[0] || "no input filename";
>
> ### Create a new parser object:
> my $parser = newMIME:arser;
>
> ### Tell it where to put things:
> $parser->output_under($dir);
>
> ### Open the file:
> open my $fh, '<', $filename1 or die "Can't read $filename1: $!\n";
>
> ### Parse an input filehandle:
> my $entity = $parser->parse($fh);
>
> ### Congratulations: you now have a (possibly multipart)MIMEentity!
> $entity->dump_skeleton;
> __END__
>
> This prints:
>
> Content-type: multipart/mixed
> Effective-type: multipart/mixed
> Body-file: NONE
> Subject: car for sale two images
> Num-parts: 2
> --
> * * *Content-type: multipart/alternative
> * * *Effective-type: multipart/alternative
> * * *Body-file: NONE
> * * *Num-parts: 2
> * * *--
> * * * * *Content-type: text/plain
> * * * * *Effective-type: text/plain
> * * * * *Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-1.txt
> * * * * *--
> * * * * *Content-type: text/html
> * * * * *Effective-type: text/html
> * * * * *Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-2.html
> * * * * *--
> * * *Content-type: image/jpeg
> * * *Effective-type: image/jpeg
> * * *Body-file: e:\tmp/msg-1234304022-16083-0/masertione.jpg
> * * *Recommended-filename: masertione.jpg
> * * *--
>
> > I need
> > to be able to deal with text body, base64 body, and image attachments,
> > and want to parse them out correctly. *I can do the base64 decoding,
> > etc. -

>
> MIME:arser will do the base64 decoding for you.
>
> > how do I accomplish this withMIME:arser ??

>
> Read the documentation carefully:
>
> parse INSTREAM
> * * Instance method. *Takes aMIME-stream and splits it into its compo-
> * * nent entities.
>
> * * The INSTREAM can be given as a readable FileHandle, an IO::File, a
> * * globref filehandle (like "\*STDIN"), or as any blessed object con-
> * * forming to the IO:: interface (which minimally implements getline()
> * * and read()).
>
> It does not mention the possibility of passing a filename and parse()
> opening it on your behalf. *This suggest that this feature does not
> exist in this version ofMIME:arser.
>
> Hope this helps,
>
> -- HansM


Thanks Hans... can you tell me if MIMEarser will handle / process
RFC (non mime) emails ?
 
Reply With Quote
 
Jack
Guest
Posts: n/a
 
      02-21-2009
On Feb 10, 2:38*pm, Hans Mulder <(E-Mail Removed)> wrote:
> Jack wrote:
> > On Feb 8, 12:08 pm, "Peter J. Holzer" <(E-Mail Removed)> wrote:
> >> On 2009-02-07 23:59, Jack <(E-Mail Removed)> wrote:

>
> >>> Hi I havent had any luck with the CPAN email modules, I just want to
> >>> parse multipart andmimeand base64, with all the varieties of email
> >>> files out there, these modules just dont work...
> >>MIME:arser works for me. It is a bit slow and tends to use ridiculuous
> >> amounts of memory if you want to avoid temporary files, but I have yet
> >> to find a (syntactically correct) email which can't parse.

>
> >> * * * * hp

>
> > Thanks Peter for the posting.. can you provide some guidance then.. I
> > tried the below code and figured the skeleton would report the base64
> > image attachments in aMIMEmessage, but isnt picking it up.

>
> The parse() method takes a file handle argument. *So you'll have to
> open the file yourself and pass the resulting handle to parse():
>
> use warnings;
> use strict;
>
> useMIME:arser;
>
> my $dir = "e:\\tmp";
>
> if (not -d $dir) {
> * * *mkdir $dir or die "Can't create directory $dir: $!\n";
>
> }
>
> my $filename1 = $ARGV[0] || "no input filename";
>
> ### Create a new parser object:
> my $parser = newMIME:arser;
>
> ### Tell it where to put things:
> $parser->output_under($dir);
>
> ### Open the file:
> open my $fh, '<', $filename1 or die "Can't read $filename1: $!\n";
>
> ### Parse an input filehandle:
> my $entity = $parser->parse($fh);
>
> ### Congratulations: you now have a (possibly multipart)MIMEentity!
> $entity->dump_skeleton;
> __END__
>
> This prints:
>
> Content-type: multipart/mixed
> Effective-type: multipart/mixed
> Body-file: NONE
> Subject: car for sale two images
> Num-parts: 2
> --
> * * *Content-type: multipart/alternative
> * * *Effective-type: multipart/alternative
> * * *Body-file: NONE
> * * *Num-parts: 2
> * * *--
> * * * * *Content-type: text/plain
> * * * * *Effective-type: text/plain
> * * * * *Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-1.txt
> * * * * *--
> * * * * *Content-type: text/html
> * * * * *Effective-type: text/html
> * * * * *Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-2.html
> * * * * *--
> * * *Content-type: image/jpeg
> * * *Effective-type: image/jpeg
> * * *Body-file: e:\tmp/msg-1234304022-16083-0/masertione.jpg
> * * *Recommended-filename: masertione.jpg
> * * *--
>
> > I need
> > to be able to deal with text body, base64 body, and image attachments,
> > and want to parse them out correctly. *I can do the base64 decoding,
> > etc. -

>
> MIME:arser will do the base64 decoding for you.
>
> > how do I accomplish this withMIME:arser ??

>
> Read the documentation carefully:
>
> parse INSTREAM
> * * Instance method. *Takes aMIME-stream and splits it into its compo-
> * * nent entities.
>
> * * The INSTREAM can be given as a readable FileHandle, an IO::File, a
> * * globref filehandle (like "\*STDIN"), or as any blessed object con-
> * * forming to the IO:: interface (which minimally implements getline()
> * * and read()).
>
> It does not mention the possibility of passing a filename and parse()
> opening it on your behalf. *This suggest that this feature does not
> exist in this version ofMIME:arser.
>
> Hope this helps,
>
> -- HansM


Also how does one capture the directory name its creating on the fly
into a variable ??
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
import parser does not import parser.py in same dir on win Joel Hedlund Python 2 11-11-2006 03:46 PM
import parser does not import parser.py in same dir on win Joel Hedlund Python 0 11-11-2006 11:34 AM
XML Parser VS HTML Parser ZOCOR Java 11 10-05-2004 01:58 PM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger Java 0 06-09-2004 01:26 AM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger XML 0 06-09-2004 01:26 AM



Advertisments