![]() |
Working with Duplicates in Perl to generate Unique ID
Hi
I have been tasked with producing a new input file which requires some manipulation of a file to generate a unique ID. I have been advised that Perl will be the simplest course of action here but in all honesty, I'm not sure where to start. My input file contains the following snippets of data. Date, Amount, Refno 2005/01/07, 00000.096532030000,#0000015511 2005/06/07, 00006.963788280000,#0000015511 2005/06/13, 00002.243425000000,#0000030502 2006/06/16, 00002.243425000000,#0000030502 2006/06/16, 00047.230000000000,#0000030502 2005/02/18, 00002.243425000000,#0000040505 2005/02/13, 00001.738765000000,#0000030627 Based on this file, I need to generate a new file containing the same fields but with an added column for the Unique id. The premise is simple. Check the refno column and match against that value against the corresponding value in the next row. If they both match, then apend append both "I" and the Date to the Refno to generate the ID. It then iterates through the rows repeating the same step until it reaches the last occurence of the Refno. When we reach the last occurence of the Refno, i.e we start a new Refno sequence, in which case we append a "P". Therefore, using the sample above, the result I would expect is as follows ID,Date,Amount, Refno 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627 If anyone can provide any assistance here, I'd really be grateful. Regards. |
Re: Working with Duplicates in Perl to generate Unique ID
esimbo@gmail.com wrote in news:1119019737.746603.282920
@o13g2000cwo.googlegroups.com: > I have been tasked with producing a new input file which requires some > manipulation of a file to generate a unique ID. I have been advised > that Perl will be the simplest course of action here but in all > honesty, I'm not sure where to start. > > My input file contains the following snippets of data. > > Date, Amount, Refno > 2005/01/07, 00000.096532030000,#0000015511 > 2005/06/07, 00006.963788280000,#0000015511 > 2005/06/13, 00002.243425000000,#0000030502 > 2006/06/16, 00002.243425000000,#0000030502 > 2006/06/16, 00047.230000000000,#0000030502 > 2005/02/18, 00002.243425000000,#0000040505 > 2005/02/13, 00001.738765000000,#0000030627 .... > ID,Date,Amount, Refno > 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511 > 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511 > 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502 > 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502 > 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502 > 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505 > 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627 I would use a hash where each Refno is a key, and values are references arrays of hash references, assuming that the file is a reasonable size. You will probably need perldoc -f split Given this information, you can write some code now. Then, if you have problems with your code, please post again. In the mean time, you might benefit from reading perldoc perlreftut as well as the posting guidelines for this group. Sinan -- A. Sinan Unur <1usa@llenroc.ude.invalid> (reverse each component and remove .invalid for email address) comp.lang.perl.misc guidelines on the WWW: http://mail.augustmail.com/~tadmc/cl...uidelines.html |
Re: Working with Duplicates in Perl to generate Unique ID
Sinan
Thanks for your response. I've got a start, which is what I needed. I must admit I wasn't aware of the rules prior to posting but I'll read them before I post again.. Thanks. Emmon |
Re: Working with Duplicates in Perl to generate Unique ID
esimbo@gmail.com wrote:
> > I have been tasked with producing a new input file which requires some > manipulation of a file to generate a unique ID. I have been advised > that Perl will be the simplest course of action here but in all > honesty, I'm not sure where to start. > > My input file contains the following snippets of data. > > Date, Amount, Refno > 2005/01/07, 00000.096532030000,#0000015511 > 2005/06/07, 00006.963788280000,#0000015511 > 2005/06/13, 00002.243425000000,#0000030502 > 2006/06/16, 00002.243425000000,#0000030502 > 2006/06/16, 00047.230000000000,#0000030502 > 2005/02/18, 00002.243425000000,#0000040505 > 2005/02/13, 00001.738765000000,#0000030627 > > Based on this file, I need to generate a new file containing the same > fields but with an added column for the Unique id. > > The premise is simple. Check the refno column and match against that > value against the corresponding value in the next row. If they both > match, then apend append both "I" and the Date to the Refno to generate > the ID. It then iterates through the rows repeating the same step until > it reaches the last occurence of the Refno. When we reach the last > occurence of the Refno, i.e we start a new Refno sequence, in which > case we append a "P". > > Therefore, using the sample above, the result I would expect is as > follows > > ID,Date,Amount, Refno > 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511 > 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511 > 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502 > 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502 > 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502 > 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505 > 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627 > > If anyone can provide any assistance here, I'd really be grateful. use warnings; use strict; my %seen; reverse map $_->[2] ? "$_->[2]_" . ( $seen{ $_->[2] }++ ? 'I' : 'P' ) . "_$_->[1], $_->[0]" : $_->[0], map [ $_, m!^([\d/]+)[^#]+#(\d+)$! ], reverse <DATA>; __DATA__ Date, Amount, Refno 2005/01/07, 00000.096532030000,#0000015511 2005/06/07, 00006.963788280000,#0000015511 2005/06/13, 00002.243425000000,#0000030502 2006/06/16, 00002.243425000000,#0000030502 2006/06/16, 00047.230000000000,#0000030502 2005/02/18, 00002.243425000000,#0000040505 2005/02/13, 00001.738765000000,#0000030627 John -- use Perl; program fulfillment |
Re: Working with Duplicates in Perl to generate Unique ID
esimbo@gmail.com <esimbo@gmail.com> kirjoitti 17.06.2005:
> > My input file contains the following snippets of data. > > Date, Amount, Refno > 2005/01/07, 00000.096532030000,#0000015511 > 2005/06/07, 00006.963788280000,#0000015511 > 2005/06/13, 00002.243425000000,#0000030502 > 2006/06/16, 00002.243425000000,#0000030502 > 2006/06/16, 00047.230000000000,#0000030502 > 2005/02/18, 00002.243425000000,#0000040505 > 2005/02/13, 00001.738765000000,#0000030627 > > The premise is simple. Check the refno column and match against that > value against the corresponding value in the next row. If they both > match, then apend append both "I" and the Date to the Refno to generate > the ID. It then iterates through the rows repeating the same step until > it reaches the last occurence of the Refno. When we reach the last > occurence of the Refno, i.e we start a new Refno sequence, in which > case we append a "P". Okay, since you need to look ahead to the next line, it would probably be easiest to first slurp all the data and then iterate over it. We can split each line into an array, which will make manipulating the fields easier, and then reassemble the lines afterwards. So: #!/usr/bin/perl use warnings; use strict; my @lines = <>; # slurp all lines from input chomp @lines; # remove newlines shift @lines; # remove first line (column names) # split the lines on commas followed by a space or a number sign (#): my @data = map [split /,[# ]/], @lines; print "ID, Date, Amount,#Refno\n"; # print new header line foreach my $i (0 .. $#data) { my ($date, $amount, $refno) = @{ $data[$i] }; # columns of this row my $next = $data[$i+1][-1] || ""; # last col of next row my $char = ($refno eq $next ? "I" : "P"); # I if equal, else P my $id = join "_", $refno, $char, $date; # construct id print "$id, $date, $amount,#$refno\n"; # print rebuilt line } There, that should do it. Hopefully the comments are clear enough that you can see how it works. In fact, this turned out to be quite a nice little example of several common Perl idioms. One idiom that may not be immediate obvious is $data[$i+1][-1] || "". The array indexing works just as the comment says, but the "logical or" with an empty string may be puzzling. In fact, all it does is eliminate an unnecessary warning. When we reach the last line, and try to access the last column of the line after that, we get an undefined value. The "logical or" replaces it with an empty string. It won't affect the values on other lines, because those are all considered by perl to be logically true. -- Ilmari Karonen To reply by e-mail, please replace ".invalid" with ".net" in address. |
Re: Working with Duplicates in Perl to generate Unique ID
Jim
I am very grateful for this. Thank you Emmon |
Re: Working with Duplicates in Perl to generate Unique ID
Hi Ilmari
That was very clear thank you. I appreciate that very much. Thanks Emmon |
Re: Working with Duplicates in Perl to generate Unique ID
"kingpin2502" <esimbo@gmail.com> writes:
> I am very grateful for this. For what? sherm-- |
Re: Working with Duplicates in Perl to generate Unique ID
John
Thanks for your help with this. I really appreciated the help Thanks Emmon |
Re: Working with Duplicates in Perl to generate Unique ID
"kingpin2502" <esimbo@gmail.com> writes:
> That was very clear thank you. I appreciate that very much. *What* was very clear? Please quote enough of the message you're replying to to provide sufficient context. sherm-- |
| All times are GMT. The time now is 12:08 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.