On Fri, 30 Apr 2010 09:06:52 -0700,
wrote:
>On Fri, 30 Apr 2010 08:55:12 -0700 (PDT), Ninja Li <> wrote:
>
>>Hi,
>>
>>I have a file with the following sample data delimited by "|" with
>>duplicate records:
>>
>>20100430|20100429|John Smith|-0.07|-0.08|
>>20100430|20100429|John Smith|-0.07|-0.08|
>>20100430|20100429|Ashley Cole|1.09|1.08|
>>20100430|20100429|Bill Thompson|0.76|0.78|
>>20100429|20100428|Time Apache|2.10|2.24|
>>
>>The first three fields "date_1", "date_2" and "name" are unique
>>identifiers of a record.
>>
>>Is there a simple way, like a one liner to remove the duplicates such
>>as with "John Smith"?
>>
>>Thanks in advance.
>>
>>Nick Li
>
>I could think of a way, but it takes 2 lines, sorry.
Wait, this might work.
c:\temp>perl -a -F"\|" -n -e "/^$/ and next or !exists $hash{$key = join '',@F[0
...2]} and ++$hash{$key} and print" file.txt
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|Ashley Cole|1.09|1.08|
20100430|20100429|Bill Thompson|0.76|0.78|
20100429|20100428|Time Apache|2.10|2.24|
c:\temp>
-sln