bowsayge <> wrote in comp.lang.perl.misc:
> shree said to us:
>
> [...]
> > Anyway, I'm struggling on thoughts of how to build a data structure to
> > transform the data into the desired output file. Any pointers, code
> > snippets will be greatly appreciated and I thank you in advance.
> [...]
>
> This isn't pretty, but it is one way of doing it.
>
> local $_;
> my ($current, %supp, @months);
>
> while ($_ = <STDIN>) {
> chomp;
> if (@months < 1) {
> s/(\w{3}-\d{2})/push @months, $1; ''/eg;
> } else {
> if (/^([a-zA-Z0-9]+)\s+\%\s+/) {
> my $sn = $1;
> push @{$supp{suppliers}}, $sn;
> s/^.*?%\s+//;
> s/([\d\.\%]+)/push @{$supp{"$sn,\%"}}, $1; ''/eg;
> $current = $sn;
> } elsif (/^\s+Defects/) {
> s/^\D+//;
> s/(\d+)\s*/push @{$supp{"$current,defects"}}, $1; ''/eg;
> } elsif (/^\s+Total/) {
> s/^\D+//;
> s/(\d+)\s*/push @{$supp{"$current,total"}}, $1; ''/eg;
> }
> }
> }
>
> foreach my $sn (@{$supp{suppliers}}) {
> foreach my $no (0..$#months) {
> my $pref = $supp{"$sn,%"};
> my $dref = $supp{"$sn,defects"};
> my $tref = $supp{"$sn,total"};
> my $suffix = "\t$months[$no]\t@{[ $no + 1 ]}\n";
> print "$sn\t\%\t$pref->[$no]$suffix";
> print "$sn\tDefects\t$dref->[$no]$suffix";
> print "$sn\tTotal\t$tref->[$no]$suffix";
> }
> }
>
> __END__
I'll believe you, for one because I know you test your programs
> EXPLANATION:
> The list of months is grabbed from the first line.
>
> Then a hash is created that contains a list of supplier names. The hash also
> is built up to contain the defect percentages, the number of defects and
> the totals.
>
> When it's time to create the output, the program iterates over the
> list of suppliers. For each supplier, the program iterates over the
> months, outputting the various statistics for that month.
I haven't analyzed your program to the last statement, but I have
some remarks.
It is much more general than necessary, in that it could read the
input data in any sequence and produce the right output. Even the
title line (which defines the expected months) could be buried
anywhere, if I'm not mistaken.
I interpret the OPs sample data to say that there is a title line
and then a sequence of groups of three, all formatted alike. It
is easier to read the file that way, expecting from each line a
given format. You can also handle each supplier as soon as you have
read the three lines, so you don't have to keep everything in memory.
With your approach, you will have to do that.
I'm also not too happy about your way to do serious data processing in
an s///e expression. This approach can be powerful, but it's hard to
follow, and it's not needed here. The data is far better split() (on
white space) first. Then the fields can be processed as needed.
The rule (known as Randal's Rule) is: If you know what to keep, use
a match, if you know what to throw away, use split. "Know" can
be translated as "know the simpler regex for". Here, the default
split on white space is the obvious choice.
> The program doesn't do exactly what you want, since it gets input from
> STDIN and outputs to STDOUT, but you can easily adjust it.
That's a minor point. Example programs on Usenet (in Perl) routinely
print to STDOUT, and read from DATA or STDIN.
> Now watch someone convert this into a one-liner
Hardly. I have posted another solution (before I saw yours), that
takes the three-lines-at-a-time approach. You will note that it
takes some effort to deal with end-of-file correctly. That is typical
for this way of reading a file in groups of n lines and is a drawback.
Anno