Velocity Reviews > C++ > Vector question

Vector question

Kitty
Guest
Posts: n/a

 08-29-2004
Hi, everyone.

Given a vector<int>, what is the fastest way to find out whether there is a
repeated element in it? The result is just "true" or "false". Thanks.

Kitty

Jon Bell
Guest
Posts: n/a

 08-29-2004
In article <413160ad\$(E-Mail Removed)-cable.com>, Kitty <No spam> wrote:
>
>Given a vector<int>, what is the fastest way to find out whether there is a
>repeated element in it? The result is just "true" or "false". Thanks.

I'd construct an empty set<int>, then walk through the vector, processing
each element in turn. If the element is in the set already, stop and
return 'true'; otherwise insert the element into the set and continue.
If you reach the end of the vector without finding any duplicates, return
'false'.

--
Jon Bell <(E-Mail Removed)> Presbyterian College
Dept. of Physics and Computer Science Clinton, South Carolina USA

Siemel Naran
Guest
Posts: n/a

 08-29-2004
"Kitty" <No spam> wrote in message news:413160ad\$(E-Mail Removed)-cable.com...

> Given a vector<int>, what is the fastest way to find out whether there is

a
> repeated element in it? The result is just "true" or "false". Thanks.

Sounds like an interview question. There are many answers, each with
tradeoffs. Jon's method uses additional memory, but is fast time wise (time
is N*O(lg(N), space is O(N)), and easy to write too -- that's important too
because software that is easier to write and maintain pays off long term,
though this rarely comes up in interviews (though it should). You can also
not use additional memory, but sacrifice time (time would be O(N^2), space
is O(1)). And each method has its own varations, such as set or hashtable,
type of hash function if a hash, set or sorted vector, vector or list, etc,
etc.

Siemel Naran
Guest
Posts: n/a

 08-29-2004
"Casper" <(E-Mail Removed)> wrote in message news:GFeYc.73396

> In a similar problem, I use a different approach due to the fact that
> jon's solution is not an option as it consumes too much memmory.
>
> I use a modified qsort and in the comparation part if two sucessive
> elements are the same, I return true. This is slow with small lists but
> fast with huge datasets (30.000+ as in my application takes about 100ms)
> but require no additional memmory!

A good solution, I think the fastest possible without using extra space.
But it changes the original vector. Can you think of a way to do it without
changing the original?

Kai-Uwe Bux
Guest
Posts: n/a

 08-29-2004
"Kitty" <No spam> wrote:

> Hi, everyone.
>
> Given a vector<int>, what is the fastest way to find out whether there is
> a repeated element in it? The result is just "true" or "false". Thanks.
>
> Kitty

Here are three obvious ways of using standard algorithms. These solutions
have the advantage that they are easy to understand. The second one could
avoid copying if it was allowed to modify the sequence in the first place.

#include <vector>
#include <set>
#include <iostream>

template < typename ConstIter >
bool has_dublicates_a ( ConstIter from, ConstIter to ) {
typedef typename std::iterator_traits< ConstIter >::value_type value_type;
typedef typename std::set< value_type > set_type;
typedef typename set_type::size_type size_type;

size_type i = 0;
set_type s;
for ( ConstIter iter = from; iter != to; ++iter ) {
++ i;
s.insert( *iter );
// this test could also be done after the loop:
if ( s.size() != i ) {
return( true );
}
}
return( false );
}

template < typename ConstIter >
bool has_dublicates_b ( ConstIter from, ConstIter to ) {
typedef typename std::iterator_traits< ConstIter >::value_type value_type;
typedef typename std::vector< value_type > vector_type;

vector_type v ( from, to );
std::sort( v.begin(), v.end() );
return( std::adjacent_find( v.begin(), v.end() ) != v.end() );
}

int main( int argn, char ** args ){
std::vector< int > a;
std::vector< int > b;
for ( int i = 0; i < 10; ++i ) {
a.push_back( i );
b.push_back( i ); b.push_back( i );
}

std::cout << has_dublicates_a( a.begin(), a.end() )
<< " "
<< has_dublicates_a( b.begin(), b.end() ) << "\n";
std::cout << has_dublicates_b( a.begin(), a.end() )
<< " "
<< has_dublicates_b( b.begin(), b.end() ) << "\n";
}

As for speed, I would suggest a method based on the "partition by exchange"
idea underlying quicksort, possibly modified like introsort to avoid O(N^2)
worst case runtime. But that idea has been mentioned in another posting

Best

Kai-Uwe Bux

Alex Vinokur
Guest
Posts: n/a

 08-29-2004

"Kitty" <No spam> wrote in message news:413160ad\$(E-Mail Removed)-cable.com...
> Hi, everyone.
>
> Given a vector<int>, what is the fastest way to find out whether there is a
> repeated element in it? The result is just "true" or "false". Thanks.
>
> Kitty
>
>

See the thread titled "If vector contains only different elements" at

--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Kai-Uwe Bux
Guest
Posts: n/a

 08-29-2004
Siemel Naran wrote:

> "Casper" <(E-Mail Removed)> wrote in message news:GFeYc.73396
>
>> In a similar problem, I use a different approach due to the fact that
>> jon's solution is not an option as it consumes too much memmory.
>>
>> I use a modified qsort and in the comparation part if two sucessive
>> elements are the same, I return true. This is slow with small lists but
>> fast with huge datasets (30.000+ as in my application takes about 100ms)
>> but require no additional memmory!

>
> A good solution, I think the fastest possible without using extra space.

I agree that this is probably the fastest solution using comparisons. Given
that the question was for vector<int>, there could be a linear time
solution. Here is a (messy) draft of something like that. The idea is to do
partition by exchange based on the even/odd distinction (note that
dublicates have the same parity). So in one pass, we put all the even
elements to the left of all the odd elements. During the same pass, shift
down by one bit. Now, we return true if there is a dublicate in the left
segment or in the right segment. Farther down in the recursion, we can
sometimes return true just based on the length of the segment: if we have
shifted so many times that all values are in [0..15], then any segment of
length 17 is bound to contain dublicates. The run time for the worst case
should be O( N * bit_length<int> ).

#include <vector>
#include <iostream>
#include <kubux/sequenceio>

typedef std::vector< unsigned int > Uvector;

void print_range ( u_int* from, u_int* to ) {
std::cerr << "[ ";
while ( from <= to ) {
std::cerr << *from << " ";
++from;
}
std::cerr << "]";
}

bool has_dublicates_helper ( u_int* from, u_int* to, u_int max ) {
// WARNING, range is closed: [from,to]
print_range( from, to );
if ( to <= from ) {
return( false );
}
if ( max < to - from ) {
return( true );
}
u_int* low = from;
u_int* high = to;
while ( true ) {
while ( ( low < high ) && ( *low % 2 == 0 ) ) {
*low >>= 1;
++low;
}
while ( ( low < high ) && ( *high % 2 != 0 ) ) {
*high >>= 1;
--high;
}
// either ( low == high ) or ( *high is even )
if ( low < high ) {
// *low is odd and *high is even
std::swap( *low, *high );
} else {
break;
}
}
std::cerr << std::endl;
print_range( from, to );
std::cerr << std::endl;
// low == high;
if ( *low % 2 == 0 ) {
*low >>= 1;
return( has_dublicates_helper( from, low, max >> 1 )
||
has_dublicates_helper( high+1, to, max >>1 ) );
} else {

*low >>= 1;
return( has_dublicates_helper( from, low-1, max >> 1 )
||
has_dublicates_helper( high, to, max >>1 ) );
}
}

bool has_dublicates ( Uvector u_vect ) {
// this makes a copy
return( has_dublicates_helper( &u_vect[0], &u_vect[ u_vect.size()-1 ],
std::numeric_limits< unsigned int >::max() ) );
}

int main( int argn, char ** args ){
Uvector a;
Uvector b;
for ( int i = 0; i < 10; ++i ) {
a.push_back( i );
b.push_back( i ); b.push_back( i );
}

std::cout << has_dublicates( a )
<< " "
<< has_dublicates( b )
<< "\n";
}

> But it changes the original vector. Can you think of a way to do it
> without changing the original?

Same here, I do not see how to avoid that.

Best

Kai-Uwe Bux

Casper
Guest
Posts: n/a

 08-29-2004
In a similar problem, I use a different approach due to the fact that
jon's solution is not an option as it consumes too much memmory.

I use a modified qsort and in the comparation part if two sucessive
elements are the same, I return true. This is slow with small lists but
fast with huge datasets (30.000+ as in my application takes about 100ms)
but require no additional memmory!

/Casper

Kitty wrote:

> Hi, everyone.
>
> Given a vector<int>, what is the fastest way to find out whether there is a
> repeated element in it? The result is just "true" or "false". Thanks.
>
> Kitty
>
>

Guest
Posts: n/a

 08-29-2004

"Siemel Naran" <(E-Mail Removed)> wrote in message
news:GSeYc.268353\$(E-Mail Removed)...
> "Casper" <(E-Mail Removed)> wrote in message news:GFeYc.73396
>
> > In a similar problem, I use a different approach due to the fact that
> > jon's solution is not an option as it consumes too much memmory.
> >
> > I use a modified qsort and in the comparation part if two sucessive
> > elements are the same, I return true. This is slow with small lists but
> > fast with huge datasets (30.000+ as in my application takes about 100ms)
> > but require no additional memmory!

>
> A good solution, I think the fastest possible without using extra space.
> But it changes the original vector. Can you think of a way to do it

without
> changing the original?

If time is not an issue, perhaps this will do?

bool has_duplicates(vector<int> const& i_Ints)
{
for (
std::vector<int>::const_iterator cit = i_Ints.begin();
cit != i_Ints.end();
)
{
int i = *cit++;
std::vector<int>::const_iterator jit = cit;
while (jit != i_Ints.end())
if (*jit++ == i)
return true;
}
return false;
}
cheers,

>
>

Kitty
Guest
Posts: n/a

 08-29-2004
The most important thing is to keep elements in the same positions after
processing. That is sorting-like algorithm is not allowed.

"Kitty" <No spam> ¦b¶l¥ó news:413160ad\$(E-Mail Removed)-cable.com ¤¤¼¶¼g...
> Hi, everyone.
>
> Given a vector<int>, what is the fastest way to find out whether there is

a
> repeated element in it? The result is just "true" or "false". Thanks.
>
> Kitty
>
>