Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Working with Duplicates in Perl to generate Unique ID

Reply
Thread Tools

Working with Duplicates in Perl to generate Unique ID

 
 
esimbo@gmail.com
Guest
Posts: n/a
 
      06-17-2005
Hi

I have been tasked with producing a new input file which requires some
manipulation of a file to generate a unique ID. I have been advised
that Perl will be the simplest course of action here but in all
honesty, I'm not sure where to start.

My input file contains the following snippets of data.

Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627

Based on this file, I need to generate a new file containing the same
fields but with an added column for the Unique id.

The premise is simple. Check the refno column and match against that
value against the corresponding value in the next row. If they both
match, then apend append both "I" and the Date to the Refno to generate
the ID. It then iterates through the rows repeating the same step until
it reaches the last occurence of the Refno. When we reach the last
occurence of the Refno, i.e we start a new Refno sequence, in which
case we append a "P".

Therefore, using the sample above, the result I would expect is as
follows

ID,Date,Amount, Refno
0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627

If anyone can provide any assistance here, I'd really be grateful.

Regards.

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-17-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote in news:1119019737.746603.282920
@o13g2000cwo.googlegroups.com:

> I have been tasked with producing a new input file which requires some
> manipulation of a file to generate a unique ID. I have been advised
> that Perl will be the simplest course of action here but in all
> honesty, I'm not sure where to start.
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627


....

> ID,Date,Amount, Refno
> 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
> 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
> 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
> 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
> 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
> 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
> 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627


I would use a hash where each Refno is a key, and values are references
arrays of hash references, assuming that the file is a reasonable size.
You will probably need

perldoc -f split

Given this information, you can write some code now. Then, if you have
problems with your code, please post again.

In the mean time, you might benefit from reading

perldoc perlreftut

as well as the posting guidelines for this group.

Sinan


--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
 
kingpin2502
Guest
Posts: n/a
 
      06-17-2005
Sinan

Thanks for your response. I've got a start, which is what I needed. I
must admit I wasn't aware of the rules prior to posting but I'll read
them before I post again..

Thanks.

Emmon

 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      06-17-2005
(E-Mail Removed) wrote:
>
> I have been tasked with producing a new input file which requires some
> manipulation of a file to generate a unique ID. I have been advised
> that Perl will be the simplest course of action here but in all
> honesty, I'm not sure where to start.
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627
>
> Based on this file, I need to generate a new file containing the same
> fields but with an added column for the Unique id.
>
> The premise is simple. Check the refno column and match against that
> value against the corresponding value in the next row. If they both
> match, then apend append both "I" and the Date to the Refno to generate
> the ID. It then iterates through the rows repeating the same step until
> it reaches the last occurence of the Refno. When we reach the last
> occurence of the Refno, i.e we start a new Refno sequence, in which
> case we append a "P".
>
> Therefore, using the sample above, the result I would expect is as
> follows
>
> ID,Date,Amount, Refno
> 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
> 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
> 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
> 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
> 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
> 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
> 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627
>
> If anyone can provide any assistance here, I'd really be grateful.


use warnings;
use strict;

my %seen;

print
reverse
map $_->[2] ? "$_->[2]_" . ( $seen{ $_->[2] }++ ? 'I' : 'P' ) .
"_$_->[1], $_->[0]" : $_->[0],
map [ $_, m!^([\d/]+)[^#]+#(\d+)$! ],
reverse
<DATA>;


__DATA__
Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Ilmari Karonen
Guest
Posts: n/a
 
      06-17-2005
(E-Mail Removed) <(E-Mail Removed)> kirjoitti 17.06.2005:
>
> My input file contains the following snippets of data.
>
> Date, Amount, Refno
> 2005/01/07, 00000.096532030000,#0000015511
> 2005/06/07, 00006.963788280000,#0000015511
> 2005/06/13, 00002.243425000000,#0000030502
> 2006/06/16, 00002.243425000000,#0000030502
> 2006/06/16, 00047.230000000000,#0000030502
> 2005/02/18, 00002.243425000000,#0000040505
> 2005/02/13, 00001.738765000000,#0000030627
>
> The premise is simple. Check the refno column and match against that
> value against the corresponding value in the next row. If they both
> match, then apend append both "I" and the Date to the Refno to generate
> the ID. It then iterates through the rows repeating the same step until
> it reaches the last occurence of the Refno. When we reach the last
> occurence of the Refno, i.e we start a new Refno sequence, in which
> case we append a "P".


Okay, since you need to look ahead to the next line, it would probably
be easiest to first slurp all the data and then iterate over it. We
can split each line into an array, which will make manipulating the
fields easier, and then reassemble the lines afterwards. So:

#!/usr/bin/perl
use warnings;
use strict;

my @lines = <>; # slurp all lines from input
chomp @lines; # remove newlines
shift @lines; # remove first line (column names)

# split the lines on commas followed by a space or a number sign (#):
my @data = map [split /,[# ]/], @lines;

print "ID, Date, Amount,#Refno\n"; # print new header line

foreach my $i (0 .. $#data) {
my ($date, $amount, $refno) = @{ $data[$i] }; # columns of this row
my $next = $data[$i+1][-1] || ""; # last col of next row
my $char = ($refno eq $next ? "I" : "P"); # I if equal, else P
my $id = join "_", $refno, $char, $date; # construct id
print "$id, $date, $amount,#$refno\n"; # print rebuilt line
}

There, that should do it. Hopefully the comments are clear enough
that you can see how it works. In fact, this turned out to be quite a
nice little example of several common Perl idioms.

One idiom that may not be immediate obvious is $data[$i+1][-1] || "".
The array indexing works just as the comment says, but the "logical
or" with an empty string may be puzzling. In fact, all it does is
eliminate an unnecessary warning. When we reach the last line, and
try to access the last column of the line after that, we get an
undefined value. The "logical or" replaces it with an empty string.
It won't affect the values on other lines, because those are all
considered by perl to be logically true.

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.
 
Reply With Quote
 
kingpin2502
Guest
Posts: n/a
 
      06-20-2005
Jim

I am very grateful for this.

Thank you
Emmon

 
Reply With Quote
 
kingpin2502
Guest
Posts: n/a
 
      06-20-2005
Hi Ilmari

That was very clear thank you. I appreciate that very much.

Thanks
Emmon

 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      06-20-2005
"kingpin2502" <(E-Mail Removed)> writes:

> I am very grateful for this.


For what?

sherm--
 
Reply With Quote
 
kingpin2502
Guest
Posts: n/a
 
      06-20-2005
John

Thanks for your help with this. I really appreciated the help

Thanks
Emmon

 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      06-20-2005
"kingpin2502" <(E-Mail Removed)> writes:

> That was very clear thank you. I appreciate that very much.


*What* was very clear? Please quote enough of the message you're replying
to to provide sufficient context.

sherm--
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a unique method in python to unique a list? Token Type Python 9 09-09-2012 02:13 PM
[Q] removing array duplicates where a subset is unique Chuck Remes Ruby 23 07-20-2009 03:21 AM
list question... unique values in all possible unique spots ToshiBoy Python 6 08-12-2008 05:01 AM
XSL FO page-number-citation - Pruning duplicates or forcing unique jason.davidson@gmail.com XML 0 06-13-2006 08:55 PM
generate own unique sessionid instead standard asp.net 120bit sessionid Ronald ASP .Net 6 02-23-2004 08:03 AM



Advertisments