Efficiently de-duping an array

D

Dan Otterburn

I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array, keeping the item with the lowest index.

my @fruits = qw(
apple
apple
pear
banana
pear
apple
banana
plum
plum
apple
plum
peach
kiwi
pear
plum
banana
cherry
);

The "apple" I want is $fruits[0], the "pear" $fruits[2] etc...

My current solution is:

my @fruits_deduped;
while (my $fruit = pop @fruits) {
next if grep { $_ eq $fruit } @fruits;
push @fruits_deduped, $fruit;
}
@fruits = reverse @fruits_deduped;

This seems to be a lot of work, is there a better way to do this?
 
G

Gunnar Hjalmarsson

Dan said:
I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array, keeping the item with the lowest index.

My current solution is:

my @fruits_deduped;
while (my $fruit = pop @fruits) {
next if grep { $_ eq $fruit } @fruits;
push @fruits_deduped, $fruit;
}
@fruits = reverse @fruits_deduped;

This seems to be a lot of work, is there a better way to do this?

Use a hash.

my ( @fruits_deduped, %seen );
while ( my $fruit = shift @fruits ) {
push @fruits_deduped, $fruit unless $seen{$fruit}++;
}

See also "perldoc -q duplicate".
 
D

Dan Otterburn

Use a hash.

my ( @fruits_deduped, %seen );
while ( my $fruit = shift @fruits ) {
push @fruits_deduped, $fruit unless $seen{$fruit}++;
}

Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
each different $fruit - isn't auto-vivified until *after* "unless" has
tested? i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}
See also "perldoc -q duplicate".

Apologies, I should have been able to find this without posting.
 
T

Tad McClellan

Dan Otterburn said:
I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array,


Your Question is Asked Frequently:

perldoc -q duplicate

How can I remove duplicate elements from a list or array?
 
G

Gunnar Hjalmarsson

Dan said:
Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
each different $fruit - isn't auto-vivified until *after* "unless" has
tested?

Well, it's rather about what $seen{$fruit}++ _returns_; please read
about auto-increment in "perldoc perlop".
i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}

Yes, almost. (Unlike my code, your code doesn't keep incrementing the
hash values.)
Apologies, I should have been able to find this without posting.

Yes. ;-) Apology accepted.
 
T

Tad McClellan

Dan Otterburn said:
Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++"
^^^^^^^^^^^^^

"unless" is not an operator, so talking about its precedence makes
no sense.

each different $fruit - isn't auto-vivified until *after* "unless" has
tested?


That part is accurate though.

i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}


....then do it with a grep():

my %seen;
@fruits = grep !$seen{$_}++, @fruits;

And it even reads kind of Englishy "grep not seen fruits" :)
 
D

Dan Otterburn

Your Question is Asked Frequently:

Thanks to both of you for being gentle and taking the time to answer
(and explain the answer to) a question that should never have been
asked.

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!
 
D

Dr.Ruud

Dan Otterburn schreef:
We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!

don't_be_too_embarassed() if $seen($mistake}++;
 
T

Tad McClellan

Dan Otterburn said:
Thanks to both of you for being gentle and taking the time to answer
(and explain the answer to) a question that should never have been
asked.

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!


Making mistakes in a public forum is a very good way to "internalize"
a lesson. You're not likely to forget what has been learned.

I've "internalized" a bunch of stuff myself. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,850
Latest member
VMRKlaus8

Latest Threads

Top