Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Beginner: read $array with line breaks line by line

Reply
Thread Tools

Beginner: read $array with line breaks line by line

 
 
Marek Stepanek
Guest
Posts: n/a
 
      08-27-2006

Hello happy Perlers,


my aim is to transform a large Address list in html to a LaTeX-Address-List
for a series (?) letter (=the same letter with many different addresses).

My html-address-list you may find on following internet address:

http://podiuminternational.org/addre...tionsfunds.htm

This file I am reading in without line breaks, or say I read in the whole
file as one line (I understand this like that, I am beginner), setting $/ to
undef. First I am reading in the file, removing the html; result is the
array @complete_address. The next step is to transform these entries of
@complete_address into a LaTeX File of the form:

\addrentry
{Lastname}
{Firstname}
{Address}
{Telephone}
{F1 } = m (šnnlich) w (eiblich) u (zwitter oder unbekannt)
{F2 } = Firma
{F3 } = email
{F4 } = Kommentar
{KEY} = SchlŁssel

I find these LaTeX-entries far too short - if somebody has an Perl-solution
for a series letter in LaTeX, I would really appreciate a hint! So this part
is my question, and it is still in work. Problem is: @complete_address
contains variables, with different lines, which I would like to read in line
by line. So I set $/ = "\n"; but this seems not to work.

And a construct of

foreach my $addr (@competitions)
{ $/ = "\n";
while (<$addr>)
{
...
}
}

seems not valid Perl.


Thank you for your patience


marek


Here my script so far:


#! /usr/bin/perl

use strict;
use warnings;
use HTML::Entities;

$/ = undef;

my (@competitions, @complete_address);

while (<>)
{
foreach my $entry (m"<dd>(.+?)</dd>"g)
{
push (@complete_address, $entry);
}
}

foreach my $e (@complete_address)
{
$e =~ s!<span\s+class="comp2">([^<]+)</span>!"Competition: " . $1 .
"\n\n"!ge;
$e =~ s!<br />!\n!g;
$e =~ s!<[^>]+>!!g;
push (@competitions, $e);
}

my $out_file1 = 'letter_comp_addr_01.adr';
open OUT1, "> $out_file1" or die "Connot create your out_file: $!";
my $out_file2 = 'letter_comp_addr_02.adr';
open OUT2, ">> $out_file2" or die "Connot create your out_file: $!";

my ($competition, $email, $first_name, $last_name, $gender);
foreach my $addr (@competitions)
{
$/ = "\n";
($competition) = $addr =~ m"^Competition:\s+(.+)";
$addr =~ s/^(International|National) Competition\s*$//i; # not working
($gender, $first_name, $last_name) = $addr =~
/^(Mr\.?|Mrs\.?)?([A-Z][a-z]+(?:\s+[A-Z][a-z]+\.?)?)\s+([A-Z][a-z]+)\s*$/;
# not working either
if ($gender)
{
if ($gender eq m/Mrs\.?/ )
{
$gender = "w";
}
elsif ($gender eq m/Mr\.?/ )
{
$gender = "m";
}
elsif ($gender == 'undef' )
{
$gender = "u";
}
}
($email) = $addr =~ m"((&#\d++)";
$email = decode_entities($email) if $email;

}

print OUT1 join ("\n\n", @competitions);
print OUT1 "\n\n";
print OUT2 "\\addrentry\n";
print OUT2 "\t{$first_name}\n" if $first_name;
print OUT2 "\t{$last_name}\n" if $last_name;
print OUT2 "\t{$competition}\n";
print OUT2 "\t{$gender}\n" if $gender;
print OUT2 "\t{$email}\n" if $email;

close OUT1;
close OUT2;

 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-27-2006
On 2006-08-27 13:40, Marek Stepanek <(E-Mail Removed)> wrote:
> Problem is: @complete_address contains variables, with different
> lines, which I would like to read in line by line. So I set $/ = "\n";
> but this seems not to work.
>
> And a construct of
>
> foreach my $addr (@competitions)
> { $/ = "\n";
> while (<$addr>)
> {
> ...
> }
> }
>
> seems not valid Perl.


The <> operator reads from a file handle, not a string. You probably
want to use the split function here:

foreach my $addr (@competitions)
{
foreach (split(/\n/, $addr)
{
...
}
}


hp


--
_ | Peter J. Holzer | > Wieso sollte man etwas erfinden was nicht
|_|_) | Sysadmin WSR | > ist?
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) | Was sonst wäre der Sinn des Erfindens?
__/ | http://www.hjp.at/ | -- P. Einstein u. V. Gringmuth in desd
 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      08-27-2006
Marek Stepanek <(E-Mail Removed)> wrote:

> for a series (?) letter (=the same letter with many different addresses).

^^^^^^^^^^^^^^^^^


I think the term you are looking for is "mail merge".


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Marek Stepanek
Guest
Posts: n/a
 
      08-27-2006
On 27.08.2006 16:35, in article (E-Mail Removed),
"Michele Dondi" <(E-Mail Removed)> wrote:

> foreach my $addr (@competitions) {
> open my $fh, '<', \$addr or die "D'Oh! $!\n";
> local $/ = "\n";
> while (<$fh>) {
> # ...
> }
> }
>
> BUT BEFORE I GET CHASTISED FOR POINTING YOU TO THIS "SOLUTION", let me
> tell you that you DON'T want to do so. You most probably want to
> split() on \n, instead.
>
>
> Michele


Looks funny your trick!

Thank you Michele, Thank you Peter,


for your answers. Something is not working. I am understanding, what you
mean with split(/\n/, $addr)). But my script is hanging now! So I suppose,
the global Variable, which I inserted $_ is not working on it?

I am sure there is an obvious mistake; sorry to bother you again, which this
long script:


#! /usr/bin/perl

use strict;
use warnings;
use HTML::Entities;

$/ = undef;

my (@competitions, @complete_address);

while (<>)
{
foreach my $entry (m"<dd>(.+?)</dd>"g)
{
push (@complete_address, $entry);
}
}

foreach my $e (@complete_address)
{
$e =~ s!<span\s+class="comp2">([^<]+)</span>!"Competition: " . $1 .
"\n\n"!ge;
$e =~ s!<br />!\n!g;
$e =~ s!<[^>]+>!!g;
push (@competitions, $e);
}

my $out_file1 = 'letter_comp_addr_01.adr';
open OUT1, "> $out_file1" or die "Connot create your out_file: $!";
my $out_file2 = 'letter_comp_addr_02.adr';
open OUT2, ">> $out_file2" or die "Connot create your out_file: $!";

my ($competition, $email, $first_name, $last_name, $gender);
foreach my $addr (@competitions)
{
foreach (split(/\n/, $addr)) #<-- did I understand it well?
{
($competition) = $_ =~ m"^Competition:\s+(.+)";
$_ =~ s/^(International|National) Competition\s*$//i;
($gender, $first_name, $last_name) = $_ =~
/^(Mr\.?|Mrs\.?)?([A-Z][a-z]+(?:\s+[A-Z][a-z]+\.?)?)\s+([A-Z][a-z]+)\s*$/;
if ($gender)
{
if ( $gender eq m/Mrs\.?/ )
{
$gender = "w";
}
elsif ( $gender eq m/Mr\.?/ )
{
$gender = "m";
}
elsif ( $gender == 'undef' )
{
$gender = "u";
}
}
($email) = $_ =~ m"((&#\d++)";
$email = decode_entities($email) if $email;
}
}

print OUT1 join ("\n\n", @competitions);
print OUT1 "\n\n";
print OUT2 "\\addrentry\n";
print OUT2 "\t{$first_name}\n" if $first_name;
print OUT2 "\t{$last_name}\n" if $last_name;
print OUT2 "\t{$competition}\n" if $competition;
print OUT2 "\t{$gender}\n" if $gender;
print OUT2 "\t{$email}\n" if $email;

close OUT1;
close OUT2;

 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-27-2006
On 2006-08-27 16:30, Marek Stepanek <(E-Mail Removed)> wrote:
> On 27.08.2006 16:35, in article (E-Mail Removed),
> "Michele Dondi" <(E-Mail Removed)> wrote:
>
>> foreach my $addr (@competitions) {
>> open my $fh, '<', \$addr or die "D'Oh! $!\n";
>> local $/ = "\n";
>> while (<$fh>) {
>> # ...
>> }
>> }
>>
>> BUT BEFORE I GET CHASTISED FOR POINTING YOU TO THIS "SOLUTION", let me
>> tell you that you DON'T want to do so. You most probably want to
>> split() on \n, instead.
>>
>>
>> Michele

>
> Looks funny your trick!
>
> Thank you Michele, Thank you Peter,
>
>
> for your answers. Something is not working. I am understanding, what you
> mean with split(/\n/, $addr)). But my script is hanging now!


I don't see where your script could "hang" except while reading its
input file and you didn't change that. It terminates just fine if I
invoke it as

../marek competitionsfunds.htm

Of course if you omit the file, it will read from STDIN, so you will
have to type in the html file .


> So I suppose, the global Variable, which I inserted $_ is not working
> on it?


I don't think I understand that sentence.

> foreach (split(/\n/, $addr)) #<-- did I understand it well?
> {

....
> }


split returns a list of the lines in $addr. The loop will run once for
each line, with $_ set to each line in turn ("for (@list)" is actually a
shorthand for "for local $_ (@list)"). So if $addr contains something
like

"Competition: Gradus ad Parnassum


National competition
Mrs. Barbara Schierl
Musik der Jugend
Promenade 37
A-4021 Linz
Fon: +43 732772015483
...."

$_ will be "Competition: Gradus ad Parnassum" during the first run of
the loop, "" during the second and third, "National competition" during
the fourth, etc.


> I am sure there is an obvious mistake;


There are a few obvious mistakes in your script, but none that would
cause it to hang.

> my ($competition, $email, $first_name, $last_name, $gender);
> foreach my $addr (@competitions)
> {

[...]
> }
>
> print OUT1 join ("\n\n", @competitions);
> print OUT1 "\n\n";
> print OUT2 "\\addrentry\n";
> print OUT2 "\t{$first_name}\n" if $first_name;
> print OUT2 "\t{$last_name}\n" if $last_name;
> print OUT2 "\t{$competition}\n" if $competition;
> print OUT2 "\t{$gender}\n" if $gender;
> print OUT2 "\t{$email}\n" if $email;
>


There a lot of addresses in your input file, yet you write only one to your
output file. Since you wrote earlier that you wanted to create a serial
letter (question to native speakers: is serial letter the right word?),
I guess you want all of them, so you have to move the print statements
into the loop.


> foreach (split(/\n/, $addr)) #<-- did I understand it well?
> {
> ($competition) = $_ =~ m"^Competition:\s+(.+)";
> $_ =~ s/^(International|National) Competition\s*$//i;
> ($gender, $first_name, $last_name) = $_ =~
> /^(Mr\.?|Mrs\.?)?([A-Z][a-z]+(?:\s+[A-Z][a-z]+\.?)?)\s+([A-Z][a-z]+)\s*$/;


You assign a value to the variables $competition, $gender, etc. on every
run through the loop. After the loop you will have only the information
from the last line. You should assign these variables only if the
information you look for is in the line you are currently processing,
e.g.:

$competition = $1 if /^Competition:\s+(.+)/;

I think you are making it more difficult by splitting the
address block into lines. Just use regexps to extract the data you are
interested in from $addr;


> if ($gender)
> {

[...]
> elsif ( $gender == 'undef' )


There are two errors in this line. First, the undefined value is not the
same as the string 'undef'. To check if $gender is undef you would have
to write

elsif ( !defined($gender) )

Second, if you really wanted to compare $gender to the string 'undef',
you would have to use the string comparison operator eq, not the
numerical comparison operator ==.

Oh, and third, if $gender is true, it has to be defined, so the test is
useless as it can never succeed.

hp


--
_ | Peter J. Holzer | > Wieso sollte man etwas erfinden was nicht
|_|_) | Sysadmin WSR | > ist?
| | | (E-Mail Removed) | Was sonst wäre der Sinn des Erfindens?
__/ | http://www.hjp.at/ | -- P. Einstein u. V. Gringmuth in desd
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-27-2006
Marek Stepanek <(E-Mail Removed)> wrote:

> $e =~ s!<span\s+class="comp2">([^<]+)</span>!"Competition: " . $1 .
> "\n\n"!ge;



The replacement string part of s/// is "double quotish" so you
get backslash escapes (\n) and interpolation ($1) for free.

No need for the eval (e) modifier:

$e =~ s!<span\s+class="comp2">([^<]+)</span>!Competition: $1\n\n!g;


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Marek Stepanek
Guest
Posts: n/a
 
      08-27-2006
On 27.08.2006 21:48, in article (E-Mail Removed),
"Tad McClellan" <(E-Mail Removed)> wrote:

> $e =~ s!<span\s+class="comp2">([^<]+)</span>!Competition: $1\n\n!g;


Thank you all for all the answers! I get only the competitions, and here and
there some email-addresses into my out-file. But tomorrow I will probably
find the mistake(s) myself. (I am online only the evening). I am learning
enormously with all your hints. Until now my script looks like follows:

#! /usr/bin/perl

use strict;
use warnings;
use HTML::Entities;

$/ = undef;

my (@competitions, @complete_address);

while (<>)
{
foreach my $entry (m"<dd>(.+?)</dd>"g)
{
push (@complete_address, $entry);
}
}

foreach my $e (@complete_address)
{
$e =~ s!<span\s+class="comp2">([^<]+)</span>!Competition: $1\n\n!g;
$e =~ s!<br />!\n!g;
$e =~ s!<[^>]+>!!g;
push (@competitions, $e);
}

my $out_file1 = 'letter_comp_addr_01.adr';
open OUT1, "> $out_file1" or die "Connot create your out_file: $!";
my $out_file2 = 'letter_comp_addr_02.adr';
open OUT2, ">> $out_file2" or die "Connot create your out_file: $!";

print OUT1 join ("\n\n", @competitions);
print OUT1 "\n\n";

my ($competition, $email, $first_name, $last_name, $gender, $phone);
foreach my $addr (@competitions)
{
foreach (split(/\n/, $addr))
{
($competition) = $1 if m/^Competition:\s+(.+)/;
s/^(International|National) Competition\s*$//i;
($gender, $first_name, $last_name) = $_ =~
/^(Mr\.?|Mrs\.?\s+)?([A-Z][a-z]+(?:\s+[A-Z][a-z]+\.?)?)\s+([A-Z][a-z]+(?:[-A
-Z][a-z]+)?)\s*$/; # this regex needs some refinement ... in work ...
# need some ideas for address and phone numbers again ... in work ...
if ($gender)
{
if ( $gender =~ m/Mrs\.?/ )
{
$gender = "w";
}
elsif ( $gender =~ m/Mr\.?/ )
{
$gender = "m";
}
else
{
$gender = "u";
}
}
($email) = $_ =~ m"((&#\d++)";
$email = decode_entities($email) if $email;
}
if ($competition)
{
print OUT2 "\\addrentry\n";
print OUT2 "\t{$first_name}\n" if $first_name;
print OUT2 "\t{$last_name}\n" if $last_name;
print OUT2 "\t{$competition}\n";
print OUT2 "\t{$gender}\n" if $gender;
print OUT2 "\t{$email}\n" if $email;
}
}

close OUT1;
close OUT2;

 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-27-2006
Marek Stepanek <(E-Mail Removed)> wrote:

> foreach my $entry (m"<dd>(.+?)</dd>"g)
> {
> push (@complete_address, $entry);
> }



You can replace that whole foreach loop with:

push @complete_address, m"<dd>(.+?)</dd>"g;

There is no need to put them in one-at-a-time.



> open OUT1, "> $out_file1" or die "Connot create your out_file: $!";



The error message should contain the name of the file:

open OUT1, '>', $out_file1 or die "Connot create your file '$out_file' $!";


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Brian McCauley
Guest
Posts: n/a
 
      08-28-2006

Marek Stepanek wrote:
> foreach my $e (@complete_address)
> {
> $e =~ s!<span\s+class="comp2">([^<]+)</span>!Competition: $1\n\n!g;
> $e =~ s!<br />!\n!g;
> $e =~ s!<[^>]+>!!g;
> push (@competitions, $e);
> }


The note control variable, $e, is an _alias_ to elements of
@complete_address not a copy of them so at the end of that loop
@competitions and @complete_address will have the same content. (Given
that @competitions was emply initially).

You can therefore discard one of them.

This is also a case where using the implicit $_ would look tidier.

foreach (@complete_address)
{
s!<span\s+class="comp2">([^<]+)</span>!Competition: $1\n\n!g;
s!<br />!\n!g;
s!<[^>]+>!!g;
}

But really, unless you have very tight control over the input file
format, you should be using a real HTML parser.

 
Reply With Quote
 
Marek Stepanek
Guest
Posts: n/a
 
      08-28-2006


I don't know, how to thank you for all your input. I came back this evening,
thinking to submit you once again my new script, but now I realize, I have
first to digest all your suggestions. So probably until tomorrow evening.

greetings from Munich


marek

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Read a file line by line and write each line to a file based on the5th byte scad C++ 23 05-17-2009 06:11 PM
How to read a text file line by line and remove some line kaushikshome C++ 4 09-10-2006 10:12 PM
Force multi line field value to output with line breaks? bernadou ASP .Net Web Controls 2 01-23-2006 01:23 PM
Read a file line by line with a maximum number of characters per line Hugo Java 10 10-18-2004 11:42 AM
Parse an xml file with line breaks in the beginning Raj Mudaliar Perl 0 07-14-2003 06:00 PM



Advertisments