Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > perl - data structure build to transpose data

Reply
Thread Tools

perl - data structure build to transpose data

 
 
shree
Guest
Posts: n/a
 
      08-29-2004
Hi,

I have been asked to transpose a data file extracted from an Excel
report and saved as a .txt file. It lists time (MonthYear) in the
header (first row). The data consists of blocks of 3 lines per
supplier. In the example extract shown below, for Jan-04, total items
supplied by supplier 1 were 1000, of which there were 200 defects,
giving a defect ratio of 20%. I need to read-in this data file and
output a file whose format I can best illustrate via an example shown
below. Please note in the outfile's last column, it shows MonthID. If
the data were to begin with Feb-04 and go till July-04, instead of
Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
so on.

Anyway, I'm struggling on thoughts of how to build a data structure to
transform the data into the desired output file. Any pointers, code
snippets will be greatly appreciated and I thank you in advance.

Best wishes,
Shree

Sample Data filein.dat
Jan-04 Feb-04 Mar-04
Supp1 % 20.00% 10.17% 7.14%
Defects 200 122 100
Total 1000 1200 1400
Supp2 % 3.00% 1.82% 1.90%
Defects 60 40 40
Total 2000 2200 2100

Desired Output fileout.txt
Supp1 % 20.00% Jan-04 1
Supp1 Defects 200 Jan-04 1
Supp1 Total 1000 Jan-04 1
Supp1 % 10.17% Feb-04 2
Supp1 Defects 122 Feb-04 2
Supp1 Total 1200 Feb-04 2
Supp1 % 7.14% Mar-04 3
Supp1 Defects 100 Mar-04 3
Supp1 Total 1400 Mar-04 3
Supp2 % 3.00% Jan-04 1
Supp2 Defects 60 Jan-04 1
Supp2 Total 2000 Jan-04 1
Supp2 % 1.82% Feb-04 2
Supp2 Defects 40 Feb-04 2
Supp2 Total 2200 Feb-04 2
Supp2 % 1.90% Mar-04 3
Supp2 Defects 40 Mar-04 3
Supp2 Total 2100 Mar-04 3
 
Reply With Quote
 
 
 
 
wfsp
Guest
Posts: n/a
 
      08-29-2004
"shree" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
> Hi,
>
> I have been asked to transpose a data file extracted from an Excel
> report and saved as a .txt file. It lists time (MonthYear) in the
> header (first row). The data consists of blocks of 3 lines per
> supplier. In the example extract shown below, for Jan-04, total items
> supplied by supplier 1 were 1000, of which there were 200 defects,
> giving a defect ratio of 20%. I need to read-in this data file and
> output a file whose format I can best illustrate via an example shown
> below. Please note in the outfile's last column, it shows MonthID. If
> the data were to begin with Feb-04 and go till July-04, instead of
> Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
> so on.
>
> Anyway, I'm struggling on thoughts of how to build a data structure to
> transform the data into the desired output file. Any pointers, code
> snippets will be greatly appreciated and I thank you in advance.
>
> Best wishes,
> Shree
>
> Sample Data filein.dat
> Jan-04 Feb-04 Mar-04
> Supp1 % 20.00% 10.17% 7.14%
> Defects 200 122 100
> Total 1000 1200 1400
> Supp2 % 3.00% 1.82% 1.90%
> Defects 60 40 40
> Total 2000 2200 2100
>
> Desired Output fileout.txt
> Supp1 % 20.00% Jan-04 1
> Supp1 Defects 200 Jan-04 1
> Supp1 Total 1000 Jan-04 1
> Supp1 % 10.17% Feb-04 2
> Supp1 Defects 122 Feb-04 2
> Supp1 Total 1200 Feb-04 2
> Supp1 % 7.14% Mar-04 3
> Supp1 Defects 100 Mar-04 3
> Supp1 Total 1400 Mar-04 3
> Supp2 % 3.00% Jan-04 1
> Supp2 Defects 60 Jan-04 1
> Supp2 Total 2000 Jan-04 1
> Supp2 % 1.82% Feb-04 2
> Supp2 Defects 40 Feb-04 2
> Supp2 Total 2200 Feb-04 2
> Supp2 % 1.90% Mar-04 3
> Supp2 Defects 40 Mar-04 3
> Supp2 Total 2100 Mar-04 3


This produces the output you indicated. If you do need a structure (e.g. a
hash of hashes) you could build it instead of using the for loop.
perlreftut, perldsc and perllol provide everything you need to construct
complex data structures.

#!/bin/perl5

use strict;
use warnings;

chomp( my $header = <DATA> );
my @months = split ' ', $header;
my $month = $#months;
while ( ! eof(DATA) ){
my @pc = split ' ', <DATA>;
my $supplier = shift @pc;
my @defects = split ' ', <DATA>;
my @total = split ' ', <DATA>;
for (my $i=1;$i<=$month+1;$i++){
my $month = $months[$i-1];
print $supplier, "\t",
$pc[0], "\t",
$pc[$i], "\t",
$month, "\t",
$i, "\n",
$supplier, "\t",
$defects[0], "\t",
$defects[$i], "\t",
$month, "\t",
$i, "\n",
$supplier, "\t",
$total[0], "\t",
$total[$i], "\t",
$month, "\t",
$i, "\n";
}
}

__DATA__
Jan-04 Feb-04 Mar-04
Supp1 % 20.00% 10.17% 7.14%
Defects 200 122 100
Total 1000 1200 1400
Supp2 % 3.00% 1.82% 1.90%
Defects 60 40 40
Total 2000 2200 2100


 
Reply With Quote
 
 
 
 
wfsp
Guest
Posts: n/a
 
      08-29-2004
<snip question>
> This produces the output you indicated. If you do need a structure (e.g. a
> hash of hashes) you could build it instead of using the for loop.
> perlreftut, perldsc and perllol provide everything you need to construct
> complex data structures.
>
> #!/bin/perl5
>
> use strict;
> use warnings;
>
> chomp( my $header = <DATA> );
> my @months = split ' ', $header;
> my $month = $#months;
> while ( ! eof(DATA) ){
> my @pc = split ' ', <DATA>;
> my $supplier = shift @pc;
> my @defects = split ' ', <DATA>;
> my @total = split ' ', <DATA>;
> for (my $i=1;$i<=$month+1;$i++){
> my $month = $months[$i-1];
> print $supplier, "\t",
> $pc[0], "\t",
> $pc[$i], "\t",
> $month, "\t",
> $i, "\n",
> $supplier, "\t",
> $defects[0], "\t",
> $defects[$i], "\t",
> $month, "\t",
> $i, "\n",
> $supplier, "\t",
> $total[0], "\t",
> $total[$i], "\t",
> $month, "\t",
> $i, "\n";
> }
> }
>
> __DATA__
> Jan-04 Feb-04 Mar-04
> Supp1 % 20.00% 10.17% 7.14%
> Defects 200 122 100
> Total 1000 1200 1400
> Supp2 % 3.00% 1.82% 1.90%
> Defects 60 40 40
> Total 2000 2200 2100
>

Poor choice of variable names!
I should have said:
> my $month_count = $#months;

and:
> for (my $i=1;$i<=$month_count+1;$i++){

It works as is but looking at it again it's clearer this way.


 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-29-2004
wfsp <(E-Mail Removed)> wrote:

> I should have said:
>> my $month_count = $#months;


>> for (my $i=1;$i<=$month_count+1;$i++){

> It works as is but looking at it again it's clearer this way.



foreach my $i ( 1 .. @mounths )

would be clearer yet.


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
David K. Wall
Guest
Posts: n/a
 
      08-29-2004
(E-Mail Removed) (shree) wrote:

> Sample Data filein.dat
> Jan-04 Feb-04 Mar-04
> Supp1 % 20.00% 10.17% 7.14%
> Defects 200 122 100
> Total 1000 1200 1400
> Supp2 % 3.00% 1.82% 1.90%
> Defects 60 40 40
> Total 2000 2200 2100
>
> Desired Output fileout.txt
> Supp1 % 20.00% Jan-04 1
> Supp1 Defects 200 Jan-04 1
> Supp1 Total 1000 Jan-04 1
> Supp1 % 10.17% Feb-04 2
> Supp1 Defects 122 Feb-04 2
> Supp1 Total 1200 Feb-04 2
> Supp1 % 7.14% Mar-04 3
> Supp1 Defects 100 Mar-04 3
> Supp1 Total 1400 Mar-04 3
> Supp2 % 3.00% Jan-04 1
> Supp2 Defects 60 Jan-04 1
> Supp2 Total 2000 Jan-04 1
> Supp2 % 1.82% Feb-04 2
> Supp2 Defects 40 Feb-04 2
> Supp2 Total 2200 Feb-04 2
> Supp2 % 1.90% Mar-04 3
> Supp2 Defects 40 Mar-04 3
> Supp2 Total 2100 Mar-04 3


Yet another way:

use strict;
use warnings;

chomp( my @months = split ' ', <DATA> );
while (not eof DATA) {

my (%supplier, $supplier_name);
for ( 1..3 ) {
my ($name, $type, @data) = split /\s+/, <DATA>;
$supplier_name = $name if $name;
@{$supplier{$type}}{@months} = @data;
}

my $month_num;
for my $month ( @months ) {
$month_num++;
for my $type (sort keys %supplier) {
print join( "\t",
$supplier_name,
$type,
$supplier{$type}{$month},
$month,
$month_num
),
"\n";
}
}
}


__DATA__
Jan-04 Feb-04 Mar-04
Supp1 % 20.00% 10.17% 7.14%
Defects 200 122 100
Total 1000 1200 1400
Supp2 % 3.00% 1.82% 1.90%
Defects 60 40 40
Total 2000 2200 2100
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      08-29-2004
shree <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Hi,
>
> I have been asked to transpose a data file extracted from an Excel
> report and saved as a .txt file. It lists time (MonthYear) in the
> header (first row). The data consists of blocks of 3 lines per
> supplier. In the example extract shown below, for Jan-04, total items
> supplied by supplier 1 were 1000, of which there were 200 defects,
> giving a defect ratio of 20%. I need to read-in this data file and
> output a file whose format I can best illustrate via an example shown
> below. Please note in the outfile's last column, it shows MonthID. If
> the data were to begin with Feb-04 and go till July-04, instead of
> Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
> so on.
>
> Anyway, I'm struggling on thoughts of how to build a data structure to
> transform the data into the desired output file. Any pointers, code
> snippets will be greatly appreciated and I thank you in advance.


It would be nice to see a snippet of your code, or even of your
thoughts about the problem. Just dumping the problem description
*you* got to the newsgroup is frowned upon. You haven't even begun
an analysis.

> Sample Data filein.dat
> Jan-04 Feb-04 Mar-04
> Supp1 % 20.00% 10.17% 7.14%
> Defects 200 122 100
> Total 1000 1200 1400
> Supp2 % 3.00% 1.82% 1.90%
> Defects 60 40 40
> Total 2000 2200 2100


So the first data line is special and gives you the months to expect.
Assuming the data in DATA, get the list of months and generate the
MonthID's like this (all untested):

my ( @months, %month_id);
@months = split for ( scalar <DATA> );
@month_id{ @months} = 1 .. @months;

Also set an output format at this point, you'll need it later:

my $ofmt = "%-5s %-7s %-7s %-6s %2d\n";

You can save yourself the effort of setting this up and use my
module Text::Table instead, but the format will do.

Now you can process the following lines in groups of three:

line: while ( 1 ) {
my ( $supp, %supp_data);
for ( scalar <DATA> ) { # get one line in $_
last line unless defined; # regular end of file
( $supp, undef, @{ $supp_data{ '%'}}{ @months}) = split;
}
for ( scalar <DATA> ) {
die "data error 2" unless defined;
( undef, @{ $supp_data{ Defects}}{ @months}) = split;
}
for ( scalar <DATA> ) {
die "data error 3" unless defined;
( undef, @{ $supp_data{ Total}}{ @months}) = split;
}

There should probably be data checks in a real program, besides the
end-of-file test I provided. Anyway, now you have collected all data
for one supplier and can print it in any format you want, for instance
this:

for my $month ( @months) {
for ( qw( % Defects Total) ) {
printf $ofmt, $supp, $_, $supp_data{ $_}->{ $month},
$month, $month_id{ $month};
}
}
}


> Desired Ouput file out.txt


[snipped]

Anno
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      08-29-2004
bowsayge <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> shree said to us:
>
> [...]
> > Anyway, I'm struggling on thoughts of how to build a data structure to
> > transform the data into the desired output file. Any pointers, code
> > snippets will be greatly appreciated and I thank you in advance.

> [...]
>
> This isn't pretty, but it is one way of doing it.
>
> local $_;
> my ($current, %supp, @months);
>
> while ($_ = <STDIN>) {
> chomp;
> if (@months < 1) {
> s/(\w{3}-\d{2})/push @months, $1; ''/eg;
> } else {
> if (/^([a-zA-Z0-9]+)\s+\%\s+/) {
> my $sn = $1;
> push @{$supp{suppliers}}, $sn;
> s/^.*?%\s+//;
> s/([\d\.\%]+)/push @{$supp{"$sn,\%"}}, $1; ''/eg;
> $current = $sn;
> } elsif (/^\s+Defects/) {
> s/^\D+//;
> s/(\d+)\s*/push @{$supp{"$current,defects"}}, $1; ''/eg;
> } elsif (/^\s+Total/) {
> s/^\D+//;
> s/(\d+)\s*/push @{$supp{"$current,total"}}, $1; ''/eg;
> }
> }
> }
>
> foreach my $sn (@{$supp{suppliers}}) {
> foreach my $no (0..$#months) {
> my $pref = $supp{"$sn,%"};
> my $dref = $supp{"$sn,defects"};
> my $tref = $supp{"$sn,total"};
> my $suffix = "\t$months[$no]\t@{[ $no + 1 ]}\n";
> print "$sn\t\%\t$pref->[$no]$suffix";
> print "$sn\tDefects\t$dref->[$no]$suffix";
> print "$sn\tTotal\t$tref->[$no]$suffix";
> }
> }
>
> __END__


I'll believe you, for one because I know you test your programs

> EXPLANATION:
> The list of months is grabbed from the first line.
>
> Then a hash is created that contains a list of supplier names. The hash also
> is built up to contain the defect percentages, the number of defects and
> the totals.
>
> When it's time to create the output, the program iterates over the
> list of suppliers. For each supplier, the program iterates over the
> months, outputting the various statistics for that month.


I haven't analyzed your program to the last statement, but I have
some remarks.

It is much more general than necessary, in that it could read the
input data in any sequence and produce the right output. Even the
title line (which defines the expected months) could be buried
anywhere, if I'm not mistaken.

I interpret the OPs sample data to say that there is a title line
and then a sequence of groups of three, all formatted alike. It
is easier to read the file that way, expecting from each line a
given format. You can also handle each supplier as soon as you have
read the three lines, so you don't have to keep everything in memory.
With your approach, you will have to do that.

I'm also not too happy about your way to do serious data processing in
an s///e expression. This approach can be powerful, but it's hard to
follow, and it's not needed here. The data is far better split() (on
white space) first. Then the fields can be processed as needed.

The rule (known as Randal's Rule) is: If you know what to keep, use
a match, if you know what to throw away, use split. "Know" can
be translated as "know the simpler regex for". Here, the default
split on white space is the obvious choice.

> The program doesn't do exactly what you want, since it gets input from
> STDIN and outputs to STDOUT, but you can easily adjust it.


That's a minor point. Example programs on Usenet (in Perl) routinely
print to STDOUT, and read from DATA or STDIN.

> Now watch someone convert this into a one-liner


Hardly. I have posted another solution (before I saw yours), that
takes the three-lines-at-a-time approach. You will note that it
takes some effort to deal with end-of-file correctly. That is typical
for this way of reading a file in groups of n lines and is a drawback.

Anno
 
Reply With Quote
 
David K. Wall
Guest
Posts: n/a
 
      08-29-2004
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Anno Siegel) wrote:

> It is much more general than necessary, in that it could read the
> input data in any sequence and produce the right output. Even the
> title line (which defines the expected months) could be buried
> anywhere, if I'm not mistaken.


That's a valid criticism of my post as well, since it will produce the
"correct" output as long as the three lines for title/percent, defects, and
total are all grouped together. I'll admit I was thinking of this as a
strength, because another line could be added to a supplier record with
minimal changes to the code. But in your post you mentioned data checks,
something I hadn't considered since I assumed the data was okay. It's good to
see another viewpoint.

Maybe if it were MY data I would have been more careful.
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      08-29-2004
David K. Wall <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> (E-Mail Removed) (shree) wrote:


[specifications]

> Yet another way:


Shree got lucky, undeservedly.

> use strict;
> use warnings;
>
> chomp( my @months = split ' ', <DATA> );
> while (not eof DATA) {
>
> my (%supplier, $supplier_name);
> for ( 1..3 ) {


This could use

die "data error $_" if eof;

> my ($name, $type, @data) = split /\s+/, <DATA>;
> $supplier_name = $name if $name;
> @{$supplier{$type}}{@months} = @data;
> }


Ah, nice common format for all data records. BTW, the default
split on ' ' wouldn't do, because it skips initial white space,
suppressing leading empty fields. Did I ever mention that split()
is too clever for its own good?

This combines part of bowsayge's flexibility WRT line sequence
with sequential processing in groups of three. The lines could
be permuted within each supplier and would end up in the right
place.

> my $month_num;
> for my $month ( @months ) {
> $month_num++;
> for my $type (sort keys %supplier) {
> print join( "\t",
> $supplier_name,
> $type,
> $supplier{$type}{$month},
> $month,
> $month_num
> ),
> "\n";
> }
> }
> }


I would... umm, did use a pre-assigned hash %month_num (or somesuch),
instead of counting $month_num each time. It takes some clutter out
of the print loop.

> __DATA__


[snipped]

I notice that you used the exact same data structure that I used to store
the data for each supplier. I wouldn't be amazed if you'd come up with it
independently, it seems inherent to the problem. I haven't analyzed
bowsayge's, but aside from some (unnecessary) trick with appending
key parts, it appears similar.

Anno
 
Reply With Quote
 
David K. Wall
Guest
Posts: n/a
 
      08-29-2004
I wrote:

> my (%supplier, $supplier_name);
> for ( 1..3 ) {
> my ($name, $type, @data) = split /\s+/, <DATA>;
> $supplier_name = $name if $name;
> @{$supplier{$type}}{@months} = @data;
> }


Looking at this again after responding to Anno, I recall just why I used a
for loop: my original (unposted) solution handled each line separately, but
when I looked at it I realized I was repeating basically the same code three
times. I didn't like that, so I condensed it.

(Can you tell it's Sunday evening and I'm bored?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help needed with Transpose XML and XSLT problem infiniti XML 4 01-16-2006 01:28 PM
Graph - node - transpose VisionSet Java 4 10-18-2005 07:15 PM
Export datagrid to Excel template, transpose ? sweeney_jones@hotmail.com ASP .Net 0 08-18-2005 09:15 PM
SWsoft Acronis Disk Director Suite 9.0 Build 508, Acronis OS Selector 8.0 Build 917, Acronis Partition Expert 2003 Build 292, Acronis Power Utilities 2004 Build 502, F-SECURE.ANTI vIRUS.PROXY v1.10.17.WINALL, F-SECURE.ANTI vIRUS v5.50.10260 for CITRI vvcd Computer Support 0 09-25-2004 01:38 AM
Transpose data into a 2 dimensional array. deepak p Perl Misc 0 08-02-2004 10:00 PM



Advertisments