Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Efficiently de-duping an array

Reply
Thread Tools

Efficiently de-duping an array

 
 
Dan Otterburn
Guest
Posts: n/a
 
      08-23-2007
I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array, keeping the item with the lowest index.

my @fruits = qw(
apple
apple
pear
banana
pear
apple
banana
plum
plum
apple
plum
peach
kiwi
pear
plum
banana
cherry
);

The "apple" I want is $fruits[0], the "pear" $fruits[2] etc...

My current solution is:

my @fruits_deduped;
while (my $fruit = pop @fruits) {
next if grep { $_ eq $fruit } @fruits;
push @fruits_deduped, $fruit;
}
@fruits = reverse @fruits_deduped;

This seems to be a lot of work, is there a better way to do this?

--
Dan Otterburn <>
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      08-24-2007
Dan Otterburn wrote:
> I have an array of a number of items, some of which are duplicates. I
> need to "de-dupe" the array, keeping the item with the lowest index.


<snip>

> My current solution is:
>
> my @fruits_deduped;
> while (my $fruit = pop @fruits) {
> next if grep { $_ eq $fruit } @fruits;
> push @fruits_deduped, $fruit;
> }
> @fruits = reverse @fruits_deduped;
>
> This seems to be a lot of work, is there a better way to do this?


Use a hash.

my ( @fruits_deduped, %seen );
while ( my $fruit = shift @fruits ) {
push @fruits_deduped, $fruit unless $seen{$fruit}++;
}

See also "perldoc -q duplicate".

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
Dan Otterburn
Guest
Posts: n/a
 
      08-24-2007
On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:

> Use a hash.
>
> my ( @fruits_deduped, %seen );
> while ( my $fruit = shift @fruits ) {
> push @fruits_deduped, $fruit unless $seen{$fruit}++;
> }


Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
each different $fruit - isn't auto-vivified until *after* "unless" has
tested? i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}

> See also "perldoc -q duplicate".


Apologies, I should have been able to find this without posting.

--
Dan Otterburn <>
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-24-2007
Dan Otterburn <> wrote:

> I have an array of a number of items, some of which are duplicates. I
> need to "de-dupe" the array,



Your Question is Asked Frequently:

perldoc -q duplicate

How can I remove duplicate elements from a list or array?


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      08-24-2007
Dan Otterburn wrote:
> On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:
>> Use a hash.
>>
>> my ( @fruits_deduped, %seen );
>> while ( my $fruit = shift @fruits ) {
>> push @fruits_deduped, $fruit unless $seen{$fruit}++;
>> }

>
> Many thanks. Just to clarify my understanding: this works because
> "unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
> each different $fruit - isn't auto-vivified until *after* "unless" has
> tested?


Well, it's rather about what $seen{$fruit}++ _returns_; please read
about auto-increment in "perldoc perlop".

> i.e. it is short-hand for:
>
> while ( my $fruit = shift @fruits ) {
> if ( !$seen{$fruit} ) {
> push @fruits_deduped, $fruit;
> $seen{$fruit} += 1;
> }
> }


Yes, almost. (Unlike my code, your code doesn't keep incrementing the
hash values.)

>> See also "perldoc -q duplicate".

>
> Apologies, I should have been able to find this without posting.


Yes. Apology accepted.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-24-2007
Dan Otterburn <> wrote:
> On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:
>
>> Use a hash.
>>
>> my ( @fruits_deduped, %seen );
>> while ( my $fruit = shift @fruits ) {
>> push @fruits_deduped, $fruit unless $seen{$fruit}++;
>> }

>
> Many thanks. Just to clarify my understanding: this works because
> "unless" binds tighter than "++"

^^^^^^^^^^^^^

"unless" is not an operator, so talking about its precedence makes
no sense.


> each different $fruit - isn't auto-vivified until *after* "unless" has
> tested?



That part is accurate though.


> i.e. it is short-hand for:
>
> while ( my $fruit = shift @fruits ) {
> if ( !$seen{$fruit} ) {
> push @fruits_deduped, $fruit;
> $seen{$fruit} += 1;
> }
> }
>
>> See also "perldoc -q duplicate".



....then do it with a grep():

my %seen;
@fruits = grep !$seen{$_}++, @fruits;

And it even reads kind of Englishy "grep not seen fruits"


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
Dan Otterburn
Guest
Posts: n/a
 
      08-24-2007
On 24 Aug, 02:31, Tad McClellan <ta...@seesig.invalid> wrote:

> Your Question is Asked Frequently:


Thanks to both of you for being gentle and taking the time to answer
(and explain the answer to) a question that should never have been
asked.

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!

 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      08-24-2007
Dan Otterburn schreef:

> We learn by our mistakes - and I have made plenty here - so, if it is
> any consolation, I have learnt more than I would have done had I found
> the FAQ in the first place. I will endeavour not to make the same
> mistakes twice!


don't_be_too_embarassed() if $seen($mistake}++;

--
Affijn, Ruud

"Gewoon is een tijger."

 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-25-2007
Dan Otterburn <> wrote:
> On 24 Aug, 02:31, Tad McClellan <ta...@seesig.invalid> wrote:
>
>> Your Question is Asked Frequently:

>
> Thanks to both of you for being gentle and taking the time to answer
> (and explain the answer to) a question that should never have been
> asked.
>
> We learn by our mistakes - and I have made plenty here - so, if it is
> any consolation, I have learnt more than I would have done had I found
> the FAQ in the first place. I will endeavour not to make the same
> mistakes twice!



Making mistakes in a public forum is a very good way to "internalize"
a lesson. You're not likely to forget what has been learned.

I've "internalized" a bunch of stuff myself.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
efficiently create and fill array.array from C code? Thomas Jollans Python 5 06-14-2010 09:39 PM
How to get html table value efficiently into an array mirthcyy@gmail.com Ruby 2 02-18-2008 11:51 PM
Array CopyConstruct as efficiently as possible Frederick Gotham C++ 1 11-15-2006 12:10 PM
Efficiently Extracting Identical Values From A List/Array Adam Hartshorne C++ 7 02-21-2005 04:58 PM
how to efficiently do sorting and get array of indices? b83503104 C Programming 3 05-21-2004 10:44 AM



Advertisments