Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > One liner to remove duplicate records

Reply
Thread Tools

One liner to remove duplicate records

 
 
Ninja Li
Guest
Posts: n/a
 
      04-30-2010
Hi,

I have a file with the following sample data delimited by "|" with
duplicate records:

20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|Ashley Cole|1.09|1.08|
20100430|20100429|Bill Thompson|0.76|0.78|
20100429|20100428|Time Apache|2.10|2.24|

The first three fields "date_1", "date_2" and "name" are unique
identifiers of a record.

Is there a simple way, like a one liner to remove the duplicates such
as with "John Smith"?

Thanks in advance.

Nick Li
 
Reply With Quote
 
 
 
 
sln@netherlands.com
Guest
Posts: n/a
 
      04-30-2010
On Fri, 30 Apr 2010 08:55:12 -0700 (PDT), Ninja Li <> wrote:

>Hi,
>
>I have a file with the following sample data delimited by "|" with
>duplicate records:
>
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|Ashley Cole|1.09|1.08|
>20100430|20100429|Bill Thompson|0.76|0.78|
>20100429|20100428|Time Apache|2.10|2.24|
>
>The first three fields "date_1", "date_2" and "name" are unique
>identifiers of a record.
>
>Is there a simple way, like a one liner to remove the duplicates such
>as with "John Smith"?
>
>Thanks in advance.
>
>Nick Li


I could think of a way, but it takes 2 lines, sorry.
-sln
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      04-30-2010
Ninja Li <> writes:

> Hi,
>
> I have a file with the following sample data delimited by "|" with
> duplicate records:
>
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|Ashley Cole|1.09|1.08|
> 20100430|20100429|Bill Thompson|0.76|0.78|
> 20100429|20100428|Time Apache|2.10|2.24|
>
> The first three fields "date_1", "date_2" and "name" are unique
> identifiers of a record.
>
> Is there a simple way, like a one liner to remove the duplicates such
> as with "John Smith"?


Yes.

But have you tried to write a multi-line Perl program first? Moving from
a working Perl program to a one-liner might be easier than starting
straight with the one-liner.

Also read up on what the various options of perl do.

--
John Bokma j3b

Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      04-30-2010
On Fri, 30 Apr 2010 09:06:52 -0700, wrote:

>On Fri, 30 Apr 2010 08:55:12 -0700 (PDT), Ninja Li <> wrote:
>
>>Hi,
>>
>>I have a file with the following sample data delimited by "|" with
>>duplicate records:
>>
>>20100430|20100429|John Smith|-0.07|-0.08|
>>20100430|20100429|John Smith|-0.07|-0.08|
>>20100430|20100429|Ashley Cole|1.09|1.08|
>>20100430|20100429|Bill Thompson|0.76|0.78|
>>20100429|20100428|Time Apache|2.10|2.24|
>>
>>The first three fields "date_1", "date_2" and "name" are unique
>>identifiers of a record.
>>
>>Is there a simple way, like a one liner to remove the duplicates such
>>as with "John Smith"?
>>
>>Thanks in advance.
>>
>>Nick Li

>
>I could think of a way, but it takes 2 lines, sorry.


Wait, this might work.

c:\temp>perl -a -F"\|" -n -e "/^$/ and next or !exists $hash{$key = join '',@F[0
...2]} and ++$hash{$key} and print" file.txt
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|Ashley Cole|1.09|1.08|
20100430|20100429|Bill Thompson|0.76|0.78|
20100429|20100428|Time Apache|2.10|2.24|

c:\temp>

-sln
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      04-30-2010
Ninja Li wrote:

> I have a file with the following sample data delimited by "|" with
> duplicate records:
>
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|Ashley Cole|1.09|1.08|
> 20100430|20100429|Bill Thompson|0.76|0.78|
> 20100429|20100428|Time Apache|2.10|2.24|
>
> The first three fields "date_1", "date_2" and "name" are unique
> identifiers of a record.
>
> Is there a simple way, like a one liner to remove the duplicates such
> as with "John Smith"?


If the data is as strict as presented, you can use

sort -u <input

sort <input |uniq

or simply use the whole line as a hash key:

perl -wne'$_{$_}++ or print' <input

(the first underscore is not really necessary)

--
Ruud
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      05-01-2010
Ninja Li <> wrote:
>I have a file with the following sample data delimited by "|" with
>duplicate records:
>
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|Ashley Cole|1.09|1.08|
>20100430|20100429|Bill Thompson|0.76|0.78|
>20100429|20100428|Time Apache|2.10|2.24|
>
>The first three fields "date_1", "date_2" and "name" are unique
>identifiers of a record.
>
>Is there a simple way, like a one liner to remove the duplicates such
>as with "John Smith"?


Your data is sorted already, so a simple call to 'uniq' will do the job:
http://en.wikipedia.org/wiki/Uniq

jue
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      05-04-2010
On Fri, 30 Apr 2010 08:55:12 -0700 (PDT), Ninja Li <> wrote:

>Hi,
>
>I have a file with the following sample data delimited by "|" with
>duplicate records:
>
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|Ashley Cole|1.09|1.08|
>20100430|20100429|Bill Thompson|0.76|0.78|
>20100429|20100428|Time Apache|2.10|2.24|
>
>The first three fields "date_1", "date_2" and "name" are unique
>identifiers of a record.
>
>Is there a simple way, like a one liner to remove the duplicates such
>as with "John Smith"?
>
>Thanks in advance.
>
>Nick Li


Another way:

perl -anF"\|" -e "tr/|// > 1 and ++$seen{qq<@F[0..2]>} > 1 and next or print" file.txt

-sln
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Urgent: Records duplicate on update! Paul Naude ASP .Net Datagrid Control 0 02-21-2006 09:25 AM
How to find duplicate records in ASP.net-Access ! Jameel ASP .Net 1 10-15-2005 10:12 AM
One-liner removing duplicate lines Damien Wyart Ruby 35 10-11-2005 03:53 PM
one-liner to make all programs one-liners Larry Perl Misc 1 02-03-2005 11:35 PM
Removing duplicate records from dataTable sumit ASP .Net 1 11-25-2003 10:42 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57