Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Q: Analyse data and provide a report - Arrays?

Reply
Thread Tools

Q: Analyse data and provide a report - Arrays?

 
 
Ga Mu
Guest
Posts: n/a
 
      09-02-2003
Troll wrote:

> Now time for some stupid Qs:
>
> Let's say that the data I have is in a file called employees.
> How can I call this file so that I can parse it?
>
> 1) Can I do:
> @HRdata = `cat employees`;
> while (<@HRdata>) {


The above is considered bad practice, especially if the file is large.
Why read the entire file into memory when you can read, process, and
discard a line at a time..? To open and read a file:

open (FIN, '<employess') || die "blah blah blah...";

while (<FIN>) {


}

>
>
> 2) With regard to the HEADING sections, the script has to be able to
> recognise the different sections by the following rules:
> # there's a blank line
> before each heading
> HEADING 1 # this is the name of the heading -
> this is a string with a special character and a blank space as part of it
> ColumnA ColumnB ColumnC # these are the column names - these are
> strings which also can inlude a blank space if they have 2 or more words
> ******* # a sort of an underlining
> pattern
>


while (<FIN>) {

if ( /^$/ ) {

# this is a blank line, don't do anything

} elsif ( /HEADING (\.+)/ ) {

# this is a heading, with the heading name in $1

} elsif ( (($name, $sex, $status, $age) = /(\s+) (\s+) (\s+) (\d+)/) ==
4 ) {

# this line contains three words and a number, do whatever
# (I'm not really sure if this will work. My Linux box is
# down and I have no way of testing.)

}

} # end of while(<FIN>)

> I guess this is to make sure that one does not include any silly heading
> data as part of the arrays created and the parsing only takes place on
> 'real' data. Can you pls advise? Or do you need more info? I'm more in
> favour of creating separate 'if' loops due to my 'newbie' status. I'll get
> lost otherwise...
>


"if loops"...? How does one make an if loop?

> Thanks.
>
>
>
> "Troll" <(E-Mail Removed)> wrote in message
> news:uRK4b.77094$(E-Mail Removed)...
>
>>Wow. I don't know how you get the time to respond to my queries in such
>>detail. It is greatly appreciated.
>>I just came back from work and it's like 2:30 am so I'll crash out soon

>
> and
>
>>have a closer read tomorrow [especially of the HEADINGS part].
>>
>>With the push @array stuff I actually got to this today in my readings. I
>>saw an example of appending an array onto another array with a push and I
>>was wondering if we could just substitute a $variable for one of the

>
> arrays.
>
>>I'm glad you confirmed this.
>>
>>I was also wondering if doing this at the beginning of the script:
>>
>>my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things
>>locally
>>
>>would be considered bad practice. I thought that one should declare things
>>as my ( ) if one is using things within a loop so as not to impact

>
> anything
>
>>external to the loop. But if one uses variables/arrays both within and
>>outside the loops, should we then still declare stuff as my ( )?
>>Maybe I'm just confused about my ( )...
>>
>>Greg, if you could possibly keep an eye on this thread for the next few

>
> days
>
>>I would be very much in your debt. Your help has been invaluabe so far in
>>allowing me to visualise quite a few things.
>>
>>Thanks very much.
>>
>>
>>"Ga Mu" <(E-Mail Removed)> wrote in message
>>news:uRJ4b.147542$(E-Mail Removed) .net...
>>
>>>Troll wrote:
>>>
>>>>Thanks again !
>>>>
>>>>1)
>>>>Sorry for being too vague. With regard to the HEADINGS they separate

>>
>>blocks
>>
>>>>of data. But because the column names will be different [data is

>>
>>different]
>>
>>>>then I'm not quite sure I could use:
>>>>$names{$heading}{$name}++;
>>>>
>>>>So I'm looking at creating separate my () definitions for each HEADING

>>
>>and
>>
>>>>just wanted to confirm how to jump out of one HEADING loop and start

>>
>>with
>>
>>>>the next.
>>>>
>>>>For example, under HEADING 1 we have these columns:
>>>>Name, Sex, Dept, M_Status, Age
>>>>
>>>>and under HEADING 2we have:
>>>>Address, Phone#, Mobile#, Salary
>>>>
>>>>So at the beginning of the script I would have
>>>>my (%names, %sexes, %depts, %m_statuses, %ages)
>>>>my (%addresses, %phones, %mobiles, %salaries)
>>>>#then I have my while (<>) and parsing here
>>>>#I have my output at the end
>>>>
>>>>Is that a little more clearer?
>>>
>>>Yes. Much clearer. There are a couple of different ways you could do
>>>this. One is to use a single loop that reads through the file and uses
>>>a state variable (e.g., $heading) to keep track of where you are in the
>>>parsing process. The other is to have a separate loop for each heading.
>>> Again, six of one, half a dozen of another. It's more a matter of
>>>preference than anything else.
>>>
>>>An example of the first approach:
>>>
>>>my $heading = 'initial';
>>>my $fin_name = '/usr/local/blah/blah/blah';
>>>open FIN,$fin_name || die "Can't open $fin_name\n";
>>>
>>>while (<FIN>) {
>>>
>>> # check for a new heading
>>> # I am assuming single word heading names
>>> if ( /HEADING (\S+)/ {
>>>
>>> $heading = $1; # set $heading equal to word extracted above
>>>
>>> # take appropriate action based on the heading we are under
>>>
>>> } elsif ( $heading eq 'NAMES' ) {
>>>
>>> ( $name, $sex, $dept, $m_status, $age ) =
>>> /(\w+) (\w+) (\w+) (\w+) (\d+)/;
>>>
>>> # update counts, append to lists, etc...
>>>
>>> } elsif ( $heading eq 'ADDRESSES' ) {
>>>
>>> # I am assuming the address field is limited to 30 characters
>>> # here:
>>> ( $address,$phone, $mobile, $salary ) =
>>> /(\.{30}) (\S+) (\S+) (\d+)/;
>>>
>>> # update counts, append to lists, etc...
>>>
>>> }
>>>
>>>}
>>>
>>>
>>>And the second approach:
>>>
>>>my $heading = 'initial';
>>>my $fin_name = '/usr/local/blah/blah/blah';
>>>open FIN,$fin_name || die "Can't open $fin_name\n";
>>>
>>># scan for first heading
>>>while ( <FIN> && ! /HEADING NAMES/ );
>>>
>>># parse the names, etc...
>>>while ( <FIN> && ! /HEADING ADDRESSES/ ) {
>>>
>>> ( $name, $sex, $dept, $m_status, $age ) =
>>> /(\w+) (\w+) (\w+) (\w+) (\d+)/;
>>>
>>> # update counts, append to lists, etc...
>>>
>>>
>>># parse the addresses, etc...
>>># for brevity , I am assuming only two headings
>>>while ( <FIN> ) {
>>>
>>> ( $address,$phone, $mobile, $salary ) =
>>> /(\.{30}) (\S+) (\S+) (\d+)/;
>>>
>>> # update counts, append to lists, etc...
>>>
>>>}
>>>
>>>
>>>>
>>>>2)
>>>>With my last question regarding the printing of the names of single

>>
>>people,
>>
>>>>if we include a print statement in the parsing loop would that give us
>>>>something like:
>>>>Pete is single.
>>>>John is single.
>>>>while the parsing is still running?
>>>
>>>Yes.
>>>
>>>
>>>>What I'm after is hopefully feeding that output into something else
>>>>[@array?] which can then print a list of the names [line by line] at

>
> the
>
>>end
>>
>>>>of the script, something like:
>>>>#this is the output structure
>>>>Number of Petes =
>>>>Number of Males =
>>>>Singles are:
>>>>Pete
>>>>John
>>>>Number of Salespeople =
>>>>
>>>>
>>>>Does this make sense?
>>>>
>>>
>>>Yes. It would be easy to create a list/array of, e.g., single people.
>>>Prior to the loop, declare the array. Within the loop, test each person
>>>for being single. If they are, push them onto the list:
>>>
>>># prior to your parsing loop, declare array @singles:
>>>
>>>my @singles;
>>>
>>># within your parsing loop, after parsing out name, status, etc.:
>>>
>>>if ( $m_status eq 'Single' ) push @singles,($name);
>>>
>>># after loop, to print the list of singles:
>>>
>>>print "Single persons:\n";
>>>foreach $single_person ( @singles ) print " $single_person\n";
>>>
>>>
>>>Greg
>>>

>>
>>

>
>


 
Reply With Quote
 
 
 
 
Troll
Guest
Posts: n/a
 
      09-02-2003
Thanks again

Will I get these errors:
Use of uninitialized value in print at ./netstat.pl line 16, <NET> line 1.
Use of uninitialized value in print at ./netstat.pl line 17, <NET> line 1.
Use of uninitialized value in print at ./netstat.pl line 18, <NET> line 1.
....etc

if an undefined value is passed, for example, to $UDP4localaddress?
Because if that's the case then all I need to do is to make sure that
whatever I'm passing as part of the m()// is correctly split and defined as
a string, digit, word etc, yes?


"Ga Mu" <(E-Mail Removed)> wrote in message
news:SM25b.251663$cF.79266@rwcrnsc53...
> Troll wrote:
> > Greg,
> > I decided to give you a glimpse at the code itself so as to make it

clearer.
> > Just be aware that the variable/array names have changed but the general
> > idea is the same.
> > The hash errors refer to the variables in the increment section.
> >
> > #!/usr/bin/perl -w
> >
> > open(NET, "netstat|") || die ("Cannot run netstat: $!");
> >
> > my(%UDP4localaddresses, %UDP4remoteaddresses, %UDP4states);
> >
> > $UDP4localaddress = '0';
> > $UDP4remoteaddress = '0';
> > $UDP4state = '0';
> >

>
> Why are you doing this (above)? This is initializing three variables to
> zero. These three variables have nothing to do with the three variables
> of the same name in the while loop.
>
> > $UDP4localaddresses = '0';
> > $UDP4remoteaddresses = '0';
> > $UDP4states = '0';
> >

>
> Why are you doing this (above)? This is initializing three scalars to
> zero. These three scalars have the same name, but have nothing else to
> do with the hashes of the same name.
>
> > $UDP4localaddresses{$UDP4localaddress} = '0';
> > $UDP4remoteaddresses{$UDP4remoteaddress} = '0';
> > $UDP4states = ($UDP4state} = '0';
> >

>
> Instances of hash keys are automatically initialized to zero. That is
> what makes them perfect for counting occurences of unknown words,
> numbers, etc. And even if you had to initialize them, you are
> initilizing $UDP4localaddresses{0} to zero.
>
> > while (<NET>) {
> > my($UDP4localaddress, $UDP4remoteaddress, $UDP4state)=
> > /(\s+) (\s+) (\s+)$/;
> >
> > #increments start here
> > $UDP4localaddresses{$UDP4localaddress}++;
> > $UDP4remoteaddresses{$UDP4remoteaddress}++;
> > $UDP4states = ($UDP4state}++;

>
> If the increments above are failing, it is probably because your m// is
> failing and one or more of the keys (variable inside the {}) are
> undefined. Try putting a print statement before the increments and
> print each of the variables you are extracting, then play with the
> regular expression until you get values for ALL of them.
>
> > }
> >
> > #here comes the output
> >
> >
> > Can you pls criticise my futile attempt to get this going? As one can

see,
> > I'm not that clear on initializations...
> >
> >

>



 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      09-02-2003
Ga Mu wrote:


> while (<FIN>) {
>
> if ( /^$/ ) {
>
> # this is a blank line, don't do anything



next if /^\s*$/; # skip blank lines (or consisting of white space
# only)

> } elsif ( /HEADING (\.+)/ ) {
>
> # this is a heading, with the heading name in $1



if (/ .....) {

# this is a heading
next;
}

> } elsif ( (($name, $sex, $status, $age) = /(\s+) (\s+) (\s+) (\d+)/) ==



if (......) {

# bla bla
next;
}

next moves on to the next "while step".

--
Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
virtual home: http://johnbokma.com/ ICQ: 218175426
John web site hints: http://johnbokma.com/websitedesign/

 
Reply With Quote
 
Ga Mu
Guest
Posts: n/a
 
      09-02-2003
Troll wrote:

> Thanks again
>
> Will I get these errors:
> Use of uninitialized value in print at ./netstat.pl line 16, <NET> line 1.
> Use of uninitialized value in print at ./netstat.pl line 17, <NET> line 1.
> Use of uninitialized value in print at ./netstat.pl line 18, <NET> line 1.
> ...etc
>
> if an undefined value is passed, for example, to $UDP4localaddress?
> Because if that's the case then all I need to do is to make sure that
> whatever I'm passing as part of the m()// is correctly split and defined as
> a string, digit, word etc, yes?
>


Exactly. Experiment with your re in the m// until you get values.


 
Reply With Quote
 
Ga Mu
Guest
Posts: n/a
 
      09-02-2003
Troll wrote:

> Thanks again. No reading files into memory from now on [unless necessary]
>
> The data will actually be read from stdin in the form of
> $ netstat | netstat.pl
> or
> $ netstat.pl < netstat
>
> Will something like this suffice?
> #!/usr/bin/perl -w
> while (<STDIN>) {


STDIN is the default file handle, so all you need is:

while (<>) {

}

>>"if loops"...? How does one make an if loop?

>
> What I meant here is that I'll create 4 separate 'if' sections [with their
> own elsif branches], one for each HEADING section [there are 4 of them].
> So I think I meant 'if' statements...is that better or I am still confusing
> my terminology?


Makes more sense...

 
Reply With Quote
 
Troll
Guest
Posts: n/a
 
      09-02-2003
Thanks very much.

I'm having a bit of drama within my parsing loop.

If I'm trying to look for a specific pattern [ie. tcp] then I am able to
find it [by printing a 'found' message]. This message is then printed each
and every time 'tcp' is found [for a total of 6 times on 6 separate lines].
The script then finishes.

But if I'm trying to increment the number of times this pattern was found I
get the dreaded error:
Use of uninitialized value in hash element at ...

Here's the code extract:
while (<>) {
my($Proto)=
/(\s+)*$/;

if (/tcp/) {
print 'found';
$Protos{$Proto}++;


where am I failing ?


 
Reply With Quote
 
Troll
Guest
Posts: n/a
 
      09-03-2003
OK, I had some luck getting the first value incremented but no more.

Version which works:
*****************
if (/tcp/) {
my($Proto)=
/^(\w+)/;
$Protos{$Proto}++;
}
print "TCP = $Protos{'tcp'}\n";

#output section
TCP = 6 # all is correct here


Version which does not work:
**********************
if (/tcp/) {
my($Proto, $RecvQ)=
/^(\w+) (\s+)/;
$Protos{$Proto}++;
$RecvQs{$RecvQ)++;
}
print "TCP = $Protos{'tcp'}\n";
print "RecvQ = $RecvQs{'0'}\n";

#output section
TCP = 6 # all is correct here
Use of uninitialized value in concatenation (.) or string at... # error
time - this refers to the 2nd print statement
RecvQ = # this is blank

I have tried reading the second parameter as a (\s+) and as a (\d+) with no
luck. If you run netstat you will probably see that all items in the RecvQ
column are 0.
What have I done wrong now?

Can a number of whitespaces be represented by:
/^(\w+) (\s+)/; # this is a word followed by some spaces followed by a
string
or is the above only ONE whitespace?


 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      09-03-2003
Troll wrote:

> OK, I had some luck getting the first value incremented but no more.
>
> Version which works:
> *****************
> if (/tcp/) {
> my($Proto)=
> /^(\w+)/;
> $Protos{$Proto}++;
> }
> print "TCP = $Protos{'tcp'}\n";
>
> #output section
> TCP = 6 # all is correct here
>
>
> Version which does not work:
> **********************
> if (/tcp/) {
> my($Proto, $RecvQ)=
> /^(\w+) (\s+)/;
> $Protos{$Proto}++;
> $RecvQs{$RecvQ)++;
> }
> print "TCP = $Protos{'tcp'}\n";
> print "RecvQ = $RecvQs{'0'}\n";
>
> #output section
> TCP = 6 # all is correct here
> Use of uninitialized value in concatenation (.) or string at... # error
> time - this refers to the 2nd print statement
> RecvQ = # this is blank
>
> I have tried reading the second parameter as a (\s+) and as a (\d+) with no
> luck. If you run netstat you will probably see that all items in the RecvQ
> column are 0.
> What have I done wrong now?


I guess you want (\S+) ie, non-whitespace. If it are always digits you
should use (\d+). If the number of spaces between proto and recvq can be
more than one you should use something like:

(\w+)\s+(\d+)

print the values of $proto and $recvq

Also, you can't be sure there are any recvqs{'0'} so check this
same for protos.

print "TCP = ...." if defined $Protos{'tcp'};
print "RecvQ = ..." if defined $RecvQs{'0'};

> Can a number of whitespaces be represented by:
> /^(\w+) (\s+)/; # this is a word followed by some spaces followed by a
> string


nope. \s+ means one or more whitespaces. Not *string*
and it is a word followed by exactly one space (white space?).
See above.

HTH

--
Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
virtual home: http://johnbokma.com/ ICQ: 218175426
John web site hints: http://johnbokma.com/websitedesign/

 
Reply With Quote
 
Ga Mu
Guest
Posts: n/a
 
      09-03-2003
Troll wrote:
> Here's the code extract:
> while (<>) {
> my($Proto)=
> /(\s+)*$/;
>


Your m// above is saying find an occurence of one or more spaces, zero
or more times, terminated by an end-of-line.

> if (/tcp/) {
> print 'found';
> $Protos{$Proto}++;


This m// has nothing to do with the value, if any, that was extracted
into $proto. It is looking at the last line read for "tcp".

I'll continue in you next post...



 
Reply With Quote
 
Troll
Guest
Posts: n/a
 
      09-03-2003
Looks like I had some typos there but after correcting them it's still a no
go
/^(\w+) (\s+)/;
was changed to
/^(\w+)(\s+)(\S+)(\s+)(\S+)/;
# looking for word(s), 1 or more spaces, non-space(s), space(s),
non-space(s)


Still get the same output tho:
#output section
TCP = 6 # all is correct here
Use of uninitialized value in concatenation (.) or string at... # error
time - this refers to the 2nd print statement
RecvQ = # this is blank

What am I missing?


"Troll" <(E-Mail Removed)> wrote in message
news:%En5b.80452$(E-Mail Removed)...
> OK, I had some luck getting the first value incremented but no more.
>
> Version which works:
> *****************
> if (/tcp/) {
> my($Proto)=
> /^(\w+)/;
> $Protos{$Proto}++;
> }
> print "TCP = $Protos{'tcp'}\n";
>
> #output section
> TCP = 6 # all is correct here
>
>
> Version which does not work:
> **********************
> if (/tcp/) {
> my($Proto, $RecvQ)=
> /^(\w+) (\s+)/;
> $Protos{$Proto}++;
> $RecvQs{$RecvQ)++;
> }
> print "TCP = $Protos{'tcp'}\n";
> print "RecvQ = $RecvQs{'0'}\n";
>
> #output section
> TCP = 6 # all is correct here
> Use of uninitialized value in concatenation (.) or string at... #

error
> time - this refers to the 2nd print statement
> RecvQ = # this is blank
>
> I have tried reading the second parameter as a (\s+) and as a (\d+) with

no
> luck. If you run netstat you will probably see that all items in the RecvQ
> column are 0.
> What have I done wrong now?
>
> Can a number of whitespaces be represented by:
> /^(\w+) (\s+)/; # this is a word followed by some spaces followed by a
> string
> or is the above only ONE whitespace?
>
>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Program create xsd for Crystal Report and provide all data on repo JB ASP .Net 2 04-08-2010 01:08 AM
acnt.com provide 2000 new computer hardware products. we provide most powerful computers on the market at reasonable prices. victoria Computer Information 0 10-11-2007 04:25 AM
Analyse which classes are used Rupert Woodman Java 5 12-16-2005 09:33 AM
Analyse time spent in synchronized blocks Dan Java 0 05-10-2005 10:16 AM
Ink analyse? coverman design Digital Photography 1 10-21-2003 04:59 PM



Advertisments