<> kirjoitti 17.06.2005:
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627
>
> The premise is simple. Check the refno column and match against that
> value against the corresponding value in the next row. If they both
> match, then apend append both "I" and the Date to the Refno to generate
> the ID. It then iterates through the rows repeating the same step until
> it reaches the last occurence of the Refno. When we reach the last
> occurence of the Refno, i.e we start a new Refno sequence, in which
> case we append a "P".
Okay, since you need to look ahead to the next line, it would probably
be easiest to first slurp all the data and then iterate over it. We
can split each line into an array, which will make manipulating the
fields easier, and then reassemble the lines afterwards. So:
#!/usr/bin/perl
use warnings;
use strict;
my @lines = <>; # slurp all lines from input
chomp @lines; # remove newlines
shift @lines; # remove first line (column names)
# split the lines on commas followed by a space or a number sign (#):
my @data = map [split /,[# ]/], @lines;
print "ID, Date, Amount,#Refno\n"; # print new header line
foreach my $i (0 .. $#data) {
my ($date, $amount, $refno) = @{ $data[$i] }; # columns of this row
my $next = $data[$i+1][-1] || ""; # last col of next row
my $char = ($refno eq $next ? "I" : "P"); # I if equal, else P
my $id = join "_", $refno, $char, $date; # construct id
print "$id, $date, $amount,#$refno\n"; # print rebuilt line
}
There, that should do it. Hopefully the comments are clear enough
that you can see how it works. In fact, this turned out to be quite a
nice little example of several common Perl idioms.
One idiom that may not be immediate obvious is $data[$i+1][-1] || "".
The array indexing works just as the comment says, but the "logical
or" with an empty string may be puzzling. In fact, all it does is
eliminate an unnecessary warning. When we reach the last line, and
try to access the last column of the line after that, we get an
undefined value. The "logical or" replaces it with an empty string.
It won't affect the values on other lines, because those are all
considered by perl to be logically true.
--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.