Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Could someone give me suggestions with my code?

Reply
Thread Tools

Could someone give me suggestions with my code?

 
 
Jonathan Clark
Guest
Posts: n/a
 
      07-11-2008
Hello, all.

I'm re-learning perl (again), and have put myself into a while loop which
i've just broken out of ( while $problemSolved != 1 {thinkOfSolution
();trySolution();} ), I'd like to post my code, complete with example
data, for you to look over, and give some suggestions on how to improve
my style.

Most of the hard stuff was figuring out the regexps I needed for this
task. It's essentially a two-rexexp solution at the moment, but was
wondering if it could be done.

I've created this with the help of perlfaq, other perldocs, Beginning
Perl, and reading this newsgroup. I did some copy-pasting from the
perldoc IIRC, for the 'framework' of the program.

Oh, yes, the purpose of this program is to strip headers from newsgroup
messages so that I can feed the files to my megahal bot later

------- headerstrip.pl --------

#!/usr/bin/perl
use warnings;
use strict;
undef $/; #read that this allows the whole file to be read in, rather
than as lines or paragraphs.

my $file = "@ARGV"; #Get the filename passed from the command line


open my $in, '<', $file or die "Can't read old file: $!";
open my $out, '>', "$file.new" or die "Can't write new file: $!";
#Standard file openings, opens the original for reading, and the new file
#(with .new extension added) to be changed...


while( <$in> )
{
s/^Path(.*?)^Xref//gsm; #clears out most of the headers,
from the Path line to the end of the word Xref, where Xref starts on it's
own line
s/^: news.*\n//; #removes the rest of that Xref line
print $out $_; #writes file
}

close $out; #And, close the file handle
--------- EOF ------------

\

Now, I'd like to be able to pass multiple files to it, and I can tell
from my limited knowledge of perl that a) it isn't gonna work yet, and b)
I'll probably have to stick my $file [...] close $out; into a while loop,
most likely with @ARGV (or a 'copy' of it) as the iterator... Or would
that be a for loop? hmm...

I'll leave you Perl gurus to pointing out my mistakes, and await your
suggestions

--
Clarjon1


Bingo, gas station, hamburger with a side order of airplane noise,
and you'll be Gary, Indiana. - Jessie in the movie "Greaser's Palace"
 
Reply With Quote
 
 
 
 
Michael Carman
Guest
Posts: n/a
 
      07-12-2008
Jonathan Clark wrote:
> #!/usr/bin/perl
> use warnings;
> use strict;


So far, so good.

> undef $/; #read that this allows the whole file to be read in, rather
> than as lines or paragraphs.


Slurping is okay for small files but doesn't scale. It should be fine
for newsgroup postings.

> my $file = "@ARGV"; #Get the filename passed from the command line


If the user specifies multiple arguments this will concatenate them. You
only want one argument.

my $file = shift @ARGV;

You might want to throw an error if the user doesn't specify a file at
all. For example:

die "Usage: $0 <file>\n";

> open my $in, '<', $file or die "Can't read old file: $!";
> open my $out, '>', "$file.new" or die "Can't write new file: $!";


Good x 3. You're using three-arg open(), checking its return value, and
including the error in any message.


> while( <$in> )


Since you're slurping the file you don't need a loop at all.

$_ = <$in>;

> {
> s/^Path(.*?)^Xref//gsm; #clears out most of the headers,
> from the Path line to the end of the word Xref, where Xref starts on it's
> own line


I suspect the /g isn't necessary there as it appears you're trying to
remove all the headers in a single pass.

Since you aren't using $1 and don't need any grouping you can get rid of
the parentheses.

s/^Path.*?^Xref//sm;

> s/^: news.*\n//; #removes the rest of that Xref line
> print $out $_; #writes file


I think you can merge the two regular expressions into one.
[Note: untested]

s/^Path.*?^Xref:[^\n]*\n/sm;

Are you trying to delete *all* message headers? What if there's
something after Xref?

I'd process the file (without undef'ing $/) this way:

while (<$in>) {
next if /^[\w-]+:/ .. /^(?![\w-]+/; # skip header lines
print $out $_; # print first line of message
print $out <$in>; # print rest of file
}

> }
>
> close $out; #And, close the file handle


This isn't strictly necessary, although I prefer to explicitly close
handles. Be consistent though. Either close both $in and $out or neither.

> Now, I'd like to be able to pass multiple files to it, and I can tell
> from my limited knowledge of perl that a) it isn't gonna work yet, and b)
> I'll probably have to stick my $file [...] close $out; into a while loop,
> most likely with @ARGV (or a 'copy' of it) as the iterator... Or would
> that be a for loop? hmm...


You need a loop. The idiomatic loop for perl is a "foreach" loop.

foreach my $file (@ARGV) {
next unless -f $file; # skip missing or non-text files
# (open files)
# (process file)
}


-mjc
 
Reply With Quote
 
 
 
 
Uri Guttman
Guest
Posts: n/a
 
      07-12-2008
>>>>> "MC" == Michael Carman <(E-Mail Removed)> writes:

MC> Jonathan Clark wrote:
>> #!/usr/bin/perl
>> use warnings;
>> use strict;


MC> So far, so good.

>> undef $/; #read that this allows the whole file to be read in,
>> rather than as lines or paragraphs.


MC> Slurping is okay for small files but doesn't scale. It should be fine
MC> for newsgroup postings.

the definition of small files has grown considerably. it used to mean
<1k and today 1mb can be condsidered small. most common text files
(sources, *ML, docs, configs, etc.) are easily slurped. and slurping if
done right is faster. of course doing it right means using
File::Slurp. it is cleaner code than undefing $/ and it is faster too.


MC> die "Usage: $0 <file>\n";

>> open my $in, '<', $file or die "Can't read old file: $!";
>> open my $out, '>', "$file.new" or die "Can't write new file: $!";


MC> Good x 3. You're using three-arg open(), checking its return value,
MC> and including the error in any message.

a good extra is to put the file name in the (improved) error string.

open my $in, '<', $file or die "Can't open $file: $!";
open my $out, '>', "$file.new" or die "Can't create $file.new: $!";

MC> Since you're slurping the file you don't need a loop at all.

MC> $_ = <$in>;

use File::Slurp ;

# no need to call open and <>. also no need to touch $/
my $post = read_file( $file ) ;

MC> Are you trying to delete *all* message headers? What if there's
MC> something after Xref?

then delete up to the first blank line (2 newlines or crlf
pairs). someone else commented on this already

MC> I'd process the file (without undef'ing $/) this way:

MC> while (<$in>) {
MC> next if /^[\w-]+:/ .. /^(?![\w-]+/; # skip header lines

what about continuation header lines?

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      07-12-2008
Uri Guttman schreef:

> today 1mb can be condsidered small


millibits were already small.

--
Affijn, Ruud

"Gewoon is een tijger."
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Could someone plase give me some advice? Cal MCSE 0 08-23-2007 10:31 AM
Could someone give me a hand setting up GMail's POP access please? Miss Perspicacia Tick Computer Support 3 01-09-2005 02:13 AM
gmail invites - Could someone give me one? Ed NZ Computing 9 10-30-2004 04:11 AM
Could someone please give me info Anita MCAD 4 03-05-2004 08:35 PM
Could someone please give info Anita MCSD 2 03-05-2004 02:03 AM



Advertisments