Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Working with Duplicates in Perl to generate Unique ID (http://www.velocityreviews.com/forums/t892887-working-with-duplicates-in-perl-to-generate-unique-id.html)

esimbo@gmail.com 06-17-2005 02:48 PM

Working with Duplicates in Perl to generate Unique ID
 
Hi

I have been tasked with producing a new input file which requires some
manipulation of a file to generate a unique ID. I have been advised
that Perl will be the simplest course of action here but in all
honesty, I'm not sure where to start.

My input file contains the following snippets of data.

Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627

Based on this file, I need to generate a new file containing the same
fields but with an added column for the Unique id.

The premise is simple. Check the refno column and match against that
value against the corresponding value in the next row. If they both
match, then apend append both "I" and the Date to the Refno to generate
the ID. It then iterates through the rows repeating the same step until
it reaches the last occurence of the Refno. When we reach the last
occurence of the Refno, i.e we start a new Refno sequence, in which
case we append a "P".

Therefore, using the sample above, the result I would expect is as
follows

ID,Date,Amount, Refno
0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627

If anyone can provide any assistance here, I'd really be grateful.

Regards.


A. Sinan Unur 06-17-2005 03:38 PM

Re: Working with Duplicates in Perl to generate Unique ID
 
esimbo@gmail.com wrote in news:1119019737.746603.282920
@o13g2000cwo.googlegroups.com:

> I have been tasked with producing a new input file which requires some
> manipulation of a file to generate a unique ID. I have been advised
> that Perl will be the simplest course of action here but in all
> honesty, I'm not sure where to start.
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627


....

> ID,Date,Amount, Refno
> 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
> 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
> 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
> 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
> 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
> 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
> 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627


I would use a hash where each Refno is a key, and values are references
arrays of hash references, assuming that the file is a reasonable size.
You will probably need

perldoc -f split

Given this information, you can write some code now. Then, if you have
problems with your code, please post again.

In the mean time, you might benefit from reading

perldoc perlreftut

as well as the posting guidelines for this group.

Sinan


--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html

kingpin2502 06-17-2005 04:44 PM

Re: Working with Duplicates in Perl to generate Unique ID
 
Sinan

Thanks for your response. I've got a start, which is what I needed. I
must admit I wasn't aware of the rules prior to posting but I'll read
them before I post again..

Thanks.

Emmon


John W. Krahn 06-17-2005 11:08 PM

Re: Working with Duplicates in Perl to generate Unique ID
 
esimbo@gmail.com wrote:
>
> I have been tasked with producing a new input file which requires some
> manipulation of a file to generate a unique ID. I have been advised
> that Perl will be the simplest course of action here but in all
> honesty, I'm not sure where to start.
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627
>
> Based on this file, I need to generate a new file containing the same
> fields but with an added column for the Unique id.
>
> The premise is simple. Check the refno column and match against that
> value against the corresponding value in the next row. If they both
> match, then apend append both "I" and the Date to the Refno to generate
> the ID. It then iterates through the rows repeating the same step until
> it reaches the last occurence of the Refno. When we reach the last
> occurence of the Refno, i.e we start a new Refno sequence, in which
> case we append a "P".
>
> Therefore, using the sample above, the result I would expect is as
> follows
>
> ID,Date,Amount, Refno
> 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
> 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
> 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
> 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
> 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
> 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
> 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627
>
> If anyone can provide any assistance here, I'd really be grateful.


use warnings;
use strict;

my %seen;

print
reverse
map $_->[2] ? "$_->[2]_" . ( $seen{ $_->[2] }++ ? 'I' : 'P' ) .
"_$_->[1], $_->[0]" : $_->[0],
map [ $_, m!^([\d/]+)[^#]+#(\d+)$! ],
reverse
<DATA>;


__DATA__
Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627



John
--
use Perl;
program
fulfillment

Ilmari Karonen 06-17-2005 11:45 PM

Re: Working with Duplicates in Perl to generate Unique ID
 
esimbo@gmail.com <esimbo@gmail.com> kirjoitti 17.06.2005:
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627
>
> The premise is simple. Check the refno column and match against that
> value against the corresponding value in the next row. If they both
> match, then apend append both "I" and the Date to the Refno to generate
> the ID. It then iterates through the rows repeating the same step until
> it reaches the last occurence of the Refno. When we reach the last
> occurence of the Refno, i.e we start a new Refno sequence, in which
> case we append a "P".


Okay, since you need to look ahead to the next line, it would probably
be easiest to first slurp all the data and then iterate over it. We
can split each line into an array, which will make manipulating the
fields easier, and then reassemble the lines afterwards. So:

#!/usr/bin/perl
use warnings;
use strict;

my @lines = <>; # slurp all lines from input
chomp @lines; # remove newlines
shift @lines; # remove first line (column names)

# split the lines on commas followed by a space or a number sign (#):
my @data = map [split /,[# ]/], @lines;

print "ID, Date, Amount,#Refno\n"; # print new header line

foreach my $i (0 .. $#data) {
my ($date, $amount, $refno) = @{ $data[$i] }; # columns of this row
my $next = $data[$i+1][-1] || ""; # last col of next row
my $char = ($refno eq $next ? "I" : "P"); # I if equal, else P
my $id = join "_", $refno, $char, $date; # construct id
print "$id, $date, $amount,#$refno\n"; # print rebuilt line
}

There, that should do it. Hopefully the comments are clear enough
that you can see how it works. In fact, this turned out to be quite a
nice little example of several common Perl idioms.

One idiom that may not be immediate obvious is $data[$i+1][-1] || "".
The array indexing works just as the comment says, but the "logical
or" with an empty string may be puzzling. In fact, all it does is
eliminate an unnecessary warning. When we reach the last line, and
try to access the last column of the line after that, we get an
undefined value. The "logical or" replaces it with an empty string.
It won't affect the values on other lines, because those are all
considered by perl to be logically true.

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.

kingpin2502 06-20-2005 09:26 AM

Re: Working with Duplicates in Perl to generate Unique ID
 
Jim

I am very grateful for this.

Thank you
Emmon


kingpin2502 06-20-2005 09:44 AM

Re: Working with Duplicates in Perl to generate Unique ID
 
Hi Ilmari

That was very clear thank you. I appreciate that very much.

Thanks
Emmon


Sherm Pendley 06-20-2005 09:51 AM

Re: Working with Duplicates in Perl to generate Unique ID
 
"kingpin2502" <esimbo@gmail.com> writes:

> I am very grateful for this.


For what?

sherm--

kingpin2502 06-20-2005 09:52 AM

Re: Working with Duplicates in Perl to generate Unique ID
 
John

Thanks for your help with this. I really appreciated the help

Thanks
Emmon


Sherm Pendley 06-20-2005 09:52 AM

Re: Working with Duplicates in Perl to generate Unique ID
 
"kingpin2502" <esimbo@gmail.com> writes:

> That was very clear thank you. I appreciate that very much.


*What* was very clear? Please quote enough of the message you're replying
to to provide sufficient context.

sherm--


All times are GMT. The time now is 12:08 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57