Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Matching spaces at start of line

Reply
Thread Tools

Matching spaces at start of line

 
 
Paul Jones
Guest
Posts: n/a
 
      11-24-2005
Perhaps someone can help me with a little perl problem.

I want to take an message and write it out to a file, but I only
want specific header information. I have included what I have
got below (ripped from a mail2news script on the web), however
the comments header is often split over multiple lines (usually
4) and I want to include them all. The 2-4th lines of the header
all begin with spaces (usually 6 - perhaps its a tab).

The program does exactly what I want of it, except for missing
the lines beginning with a space.




open (INEWS, "| $program $options") ||
die "$program: can't run $news_poster_program\n";

# header loop
while (<STDIN>) {
last if /^$/;

s/(?i)^Date/Date/;
s/(?i)^From/From/;
s/(?i)^Subject/Subject/;
s/(?i)^Newsgroups/Newsgroups/;
s/(?i)^Comments/Comments/;

print INEWS

if /^(Date|From|Subject|Newsgroups|Comments):/i;
$saw_newsgroup |= ( $+ eq 'Newsgroups' );

}

die "$program: didn't get newsgroup from headers\n"
unless $saw_newsgroup;

print INEWS "\n";

print INEWS while <STDIN>; # gobble rest of message


 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      11-24-2005
Paul Jones schreef:

> Perhaps someone can help me with a little perl problem.
>
> I want to take an message and write it out to a file, but I only
> want specific header information. I have included what I have
> got below (ripped from a mail2news script on the web), however
> the comments header is often split over multiple lines (usually
> 4) and I want to include them all. The 2-4th lines of the header
> all begin with spaces (usually 6 - perhaps its a tab).
>
> The program does exactly what I want of it, except for missing
> the lines beginning with a space.



use strict;
use warnings;

> open (INEWS, "| $program $options") ||
> die "$program: can't run $news_poster_program\n";
>
> # header loop
> while (<STDIN>) {
> last if /^$/;


You process line-by-line, but some header fields are folded. Reading the
whole header at once will make things simpler, you won't even need the
while-loop. See perldoc perlvar, specifically $/, the input record
separator.

local $/ = '';

When the header is in $_, you can use 's/\n\s+/ /g' to unfold the folded
header fields.
(err, that \s needs to be [[:blank:]])

After that you can remove the header fields that you don't want:

s/\n(?!(Date|From|Subject|Newsgroups|Comments).*//gi


> s/(?i)^Date/Date/;
> s/(?i)^From/From/;
> s/(?i)^Subject/Subject/;
> s/(?i)^Newsgroups/Newsgroups/;
> s/(?i)^Comments/Comments/;


It is better to include the ':' too:

s/^date:/Date:/mi;


Basically:

#!/usr/bin/perl

use strict;
use warnings;

{ local ($/, $\) = ('', "\n\n");

# read the header
$_ = <STDIN>;

# unfold header fields
s/\n[[:blank:]]+/ /g;

# remove unwanted header fields
s/\n(?!(Date|From|Subject|etc.).*//gi;

# standardize header field names
s/^date:/Date:/mi;
# etc.

print;

# read and print the body
undef $/;
undef $\;
$_ = <STDIN>;
print;
}

--
Affijn, Ruud

"Gewoon is een tijger."

 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      11-24-2005
Paul Jones <(E-Mail Removed)> wrote:

> the comments header is often split over multiple lines (usually
> 4) and I want to include them all. The 2-4th lines of the header
> all begin with spaces (usually 6 - perhaps its a tab).



> print INEWS
>
> if /^(Date|From|Subject|Newsgroups|Comments):/i;



if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      11-25-2005
Tad McClellan wrote:
> Paul Jones <(E-Mail Removed)> wrote:
>
>>the comments header is often split over multiple lines (usually
>>4) and I want to include them all. The 2-4th lines of the header
>>all begin with spaces (usually 6 - perhaps its a tab).

>
>
>> print INEWS
>>
>> if /^(Date|From|Subject|Newsgroups|Comments):/i;

>
>
> if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;


That is going to print from every header that is folded, not just the Comments
header.


John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      11-25-2005
Paul Jones wrote:
> Perhaps someone can help me with a little perl problem.
>
> I want to take an message and write it out to a file, but I only
> want specific header information. I have included what I have
> got below (ripped from a mail2news script on the web), however
> the comments header is often split over multiple lines (usually
> 4) and I want to include them all. The 2-4th lines of the header
> all begin with spaces (usually 6 - perhaps its a tab).
>
> The program does exactly what I want of it, except for missing
> the lines beginning with a space.
>
>
>
> open (INEWS, "| $program $options") ||
> die "$program: can't run $news_poster_program\n";
>
> # header loop
> while (<STDIN>) {
> last if /^$/;
>
> s/(?i)^Date/Date/;
> s/(?i)^From/From/;
> s/(?i)^Subject/Subject/;
> s/(?i)^Newsgroups/Newsgroups/;
> s/(?i)^Comments/Comments/;
>
> print INEWS
>
> if /^(Date|From|Subject|Newsgroups|Comments):/i;
> $saw_newsgroup |= ( $+ eq 'Newsgroups' );
>
> }
>
> die "$program: didn't get newsgroup from headers\n"
> unless $saw_newsgroup;
>
> print INEWS "\n";
>
> print INEWS while <STDIN>; # gobble rest of message


Here is one way to do it:


open INEWS, '|-', split( ' ', "$program $options" )
or die "$program: can't run $news_poster_program: $!";

my $saw_newsgroup;
while ( <STDIN> ) {
last if /^$/;
if ( s/^(comments/\u\L$1/i ) {
my $line = <>;
if ( $line =~ /^\s/ ) {
$_ .= $line;
}
else {
print INEWS;
$_ = $line;
}
redo;
}
elsif ( s/^(newsgroups/\u\L$1/i ) {
$saw_newsgroup++;
}
elsif ( ! s/^(date|from|subject):/\u\L$1:/i ) {
next;
}
print INEWS;
}

die "$program: didn't get newsgroup from headers\n"
unless $saw_newsgroup;

print INEWS "\n";
print INEWS while <STDIN>;

close INEWS
or warn $! ? "Error closing $program: $!"
: "Exit status $? from $program";

__END__



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      11-25-2005
John W. Krahn <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Tad McClellan wrote:
> > Paul Jones <(E-Mail Removed)> wrote:
> >
> >>the comments header is often split over multiple lines (usually
> >>4) and I want to include them all. The 2-4th lines of the header
> >>all begin with spaces (usually 6 - perhaps its a tab).

> >
> >
> >> print INEWS
> >>
> >> if /^(Date|From|Subject|Newsgroups|Comments):/i;

> >
> >
> > if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;

>
> That is going to print from every header that is folded, not just the Comments
> header.


I think it should, for those headers that are to be printed at all. I
know that OP singled out the Comments header, but that's probably just
because it's the only one that *has* continuation lines with any regularity.

I read the problem as "Print all valid headers ( Date, From, ...) and
their continuation lines, suppress all others, including their continuation
lines". Could be done like this:

my $saw_newsgroup;
my $valid_header;
while (<DATA>) {
last if /^$/;

if ( /^\S/ ) {
$valid_header =
s{^(Date|From|Subject|Newsgroups|Comments):}
{\u\L$1:}i;
$saw_newsgroup ||= $valid_header && $1 eq 'Newsgroups';
}
print if $valid_header;
}
die "didn't get newsgroup from headers\n" unless $saw_newsgroup;

print "\n";
print while <DATA>; # gobble rest of message

Besides adding the logic to control $valid_header, I have used the
s/// operation that normalizes header spelling to determine if
we have a valid header. This way, the list of valid headers appears
only once in the code, while the original code had it twice. A
maintainer will appreciate that.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
 
Reply With Quote
 
Paul Jones
Guest
Posts: n/a
 
      11-25-2005
In article <(E-Mail Removed)>
"Dr.Ruud" <(E-Mail Removed)> wrote:
>
> Paul Jones schreef:
>
> > Perhaps someone can help me with a little perl problem.
> >
> > I want to take an message and write it out to a file, but I only
> > want specific header information. I have included what I have
> > got below (ripped from a mail2news script on the web), however
> > the comments header is often split over multiple lines (usually
> > 4) and I want to include them all. The 2-4th lines of the header
> > all begin with spaces (usually 6 - perhaps its a tab).
> >
> > The program does exactly what I want of it, except for missing
> > the lines beginning with a space.

>
>
> use strict;
> use warnings;
>
> > open (INEWS, "| $program $options") ||
> > die "$program: can't run $news_poster_program\n";
> >
> > # header loop
> > while (<STDIN>) {
> > last if /^$/;

>
> You process line-by-line, but some header fields are folded. Reading the
> whole header at once will make things simpler, you won't even need the
> while-loop. See perldoc perlvar, specifically $/, the input record
> separator.
>



<snip great advice>



> Basically:


Thanks very much!

Unfortunately I chopped my script down a bit to post it on here,
thinking that as some of it is in essence just repeated, or so I
thought, I needent bother you with it all (it's really about 2-
3x the length of what I posted). Unfortunately my first attempt
at expanding what you have given me failed to work fairly
miserably!

I will try a little longer at getting it to work, but if I still
have problems I'll post the whole thing and maybe you can point
out what I'm doing wrong. I'm very new to perl and all I know is
what I've managed to get from the web or from other peoples
annotated scripts.







 
Reply With Quote
 
Paul Jones
Guest
Posts: n/a
 
      11-25-2005
In article <dm6slf$se2$(E-Mail Removed)-Berlin.DE>
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Anno Siegel) wrote:
>
> John W. Krahn <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> > Tad McClellan wrote:
> > > Paul Jones <(E-Mail Removed)> wrote:
> > >
> > >>the comments header is often split over multiple lines (usually
> > >>4) and I want to include them all. The 2-4th lines of the header
> > >>all begin with spaces (usually 6 - perhaps its a tab).
> > >
> > >
> > >> print INEWS
> > >>
> > >> if /^(Date|From|Subject|Newsgroups|Comments):/i;
> > >
> > >
> > > if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;

> >
> > That is going to print from every header that is folded, not just the Comments
> > header.

>
> I think it should, for those headers that are to be printed at all. I
> know that OP singled out the Comments header, but that's probably just
> because it's the only one that *has* continuation lines with any regularity.
>
> I read the problem as "Print all valid headers ( Date, From, ...) and
> their continuation lines, suppress all others, including their continuation
> lines". Could be done like this:


Perhaps a better way of approaching the problem would have been
to print all headers except a list that should be excluded.

Anyway, maybe I should have just listed the entire requirements
of what I wanted, rather than show you what I've got and ask how
to do it better. So here we go:

The message is routed via Exim4 in it's transport phase.

mail2news_transport:
debug_print = "T: mail2news for $local_part@$domain"
driver = pipe
command = /usr/local/bin/mail2news.pl
user = mail

It's then picked up by the perl script.

The perl script starts with:



$news_poster_program = "/usr/bin/rnews";
$news_poster_options = "-r localhost";

open (INEWS, "| $news_poster_program $news_poster_options") ||
die "$program: can't run $news_poster_program\n";



Then it goes on to work on the headers. I would like to exclude:

Control:
Supersedes:
Authorised:
Received:

(Note Received: can go over multiple lines)

Certain headers need to be supplied or it fails:

From:
Newsgroups:
Date:


Some are required but can be created if absent:

Subject:
Message-ID:
Organization:

(I think I've got that bit sorted)

($sec,$min,$hour,$mday,$mon,$year)=localtime(time) ;
$madeupid = "\<$year$mon$mday.$hour$min$sec.$$\@myhost.com \>";

print INEWS "Subject: Untitled\n" unless $saw_subject;
printf INEWS "Message-ID: %s\n", $madeupid unless $saw_msgid;
print INEWS "Organization: Unknown\n" unless $saw_organization;


Some must be created (removing any that already exist):

print INEWS "Path: posted\n";


Others should be printed, unless excluded.

One final thing - it would be nice to grab the IP address from
the first Received header to add it to an NNTP-Posting-Host:
header.

This is basically a mail2news posting perl script. The ones
already out there dind't seem to work as well as I liked, so I
tried fiddling with them to improve them. I've got it working,
however it could be (much) better (because I know next to
nothing about PERL).

Anyway, thanks for the helps so far. I'll keep tinkering. But I
would be happy to receive any more tips!


 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      11-26-2005
Anno Siegel wrote:
> John W. Krahn <(E-Mail Removed)> wrote in comp.lang.perl.misc:
>>Tad McClellan wrote:
>>>Paul Jones <(E-Mail Removed)> wrote:
>>>
>>>>the comments header is often split over multiple lines (usually
>>>>4) and I want to include them all. The 2-4th lines of the header
>>>>all begin with spaces (usually 6 - perhaps its a tab).
>>>
>>>> print INEWS
>>>>
>>>> if /^(Date|From|Subject|Newsgroups|Comments):/i;
>>>
>>> if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;

>>That is going to print from every header that is folded, not just the Comments
>>header.

>
> I think it should, for those headers that are to be printed at all. I
> know that OP singled out the Comments header, but that's probably just
> because it's the only one that *has* continuation lines with any regularity.
>
> I read the problem as "Print all valid headers ( Date, From, ...) and
> their continuation lines, suppress all others, including their continuation
> lines". Could be done like this:
>
> my $saw_newsgroup;
> my $valid_header;
> while (<DATA>) {
> last if /^$/;
>
> if ( /^\S/ ) {
> $valid_header =
> s{^(Date|From|Subject|Newsgroups|Comments):}
> {\u\L$1:}i;
> $saw_newsgroup ||= $valid_header && $1 eq 'Newsgroups';


If the original header is not exactly 'Newsgroups', for example 'newsgroups'
or 'NewsGroups', then that expression will fail. Perhaps:

$saw_newsgroup ||= $valid_header && lc( $1 ) eq 'newsgroups';

> }
> print if $valid_header;
> }
> die "didn't get newsgroup from headers\n" unless $saw_newsgroup;
>
> print "\n";
> print while <DATA>; # gobble rest of message
>
> Besides adding the logic to control $valid_header, I have used the
> s/// operation that normalizes header spelling to determine if
> we have a valid header. This way, the list of valid headers appears
> only once in the code, while the original code had it twice. A
> maintainer will appreciate that.



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      11-26-2005
John W. Krahn <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Anno Siegel wrote:


[...]

> > $valid_header =
> > s{^(Date|From|Subject|Newsgroups|Comments):}
> > {\u\L$1:}i;
> > $saw_newsgroup ||= $valid_header && $1 eq 'Newsgroups';

>
> If the original header is not exactly 'Newsgroups', for example 'newsgroups'
> or 'NewsGroups', then that expression will fail. Perhaps:
>
> $saw_newsgroup ||= $valid_header && lc( $1 ) eq 'newsgroups';
>
> > }
> > print if $valid_header;
> > }
> > die "didn't get newsgroup from headers\n" unless $saw_newsgroup;


You are right, I noticed that bug myself after posting. I thought of the
same fix too.

It is remarkable (and comforting) how carefully code is read that is
posted here.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XSD to allow A-Z 0-9 and spaces, but not leading/trailing spaces johkar XML 2 12-10-2009 09:24 AM
[CSS] how can I show spaces as spaces? Tomasz Chmielewski HTML 21 09-10-2009 06:43 PM
how can I show spaces as spaces, part 2 Tomasz Chmielewski HTML 14 09-10-2009 03:54 PM
Re: How to trim a String trailing spaces, but not leading spaces? Roedy Green Java 3 09-14-2008 02:10 AM
Re: How to trim a String trailing spaces, but not leading spaces? John B. Matthews Java 4 09-12-2008 05:28 AM



Advertisments