Siemel Naran wrote:
> "Casper" <(EMail Removed)> wrote in message news:GFeYc.73396
>
>> In a similar problem, I use a different approach due to the fact that
>> jon's solution is not an option as it consumes too much memmory.
>>
>> I use a modified qsort and in the comparation part if two sucessive
>> elements are the same, I return true. This is slow with small lists but
>> fast with huge datasets (30.000+ as in my application takes about 100ms)
>> but require no additional memmory!
>
> A good solution, I think the fastest possible without using extra space.
I agree that this is probably the fastest solution using comparisons. Given
that the question was for vector<int>, there could be a linear time
solution. Here is a (messy) draft of something like that. The idea is to do
partition by exchange based on the even/odd distinction (note that
dublicates have the same parity). So in one pass, we put all the even
elements to the left of all the odd elements. During the same pass, shift
down by one bit. Now, we return true if there is a dublicate in the left
segment or in the right segment. Farther down in the recursion, we can
sometimes return true just based on the length of the segment: if we have
shifted so many times that all values are in [0..15], then any segment of
length 17 is bound to contain dublicates. The run time for the worst case
should be O( N * bit_length<int> ).
#include <vector>
#include <iostream>
#include <kubux/sequenceio>
typedef std::vector< unsigned int > Uvector;
void print_range ( u_int* from, u_int* to ) {
std::cerr << "[ ";
while ( from <= to ) {
std::cerr << *from << " ";
++from;
}
std::cerr << "]";
}
bool has_dublicates_helper ( u_int* from, u_int* to, u_int max ) {
// WARNING, range is closed: [from,to]
print_range( from, to );
if ( to <= from ) {
return( false );
}
if ( max < to  from ) {
return( true );
}
u_int* low = from;
u_int* high = to;
while ( true ) {
while ( ( low < high ) && ( *low % 2 == 0 ) ) {
*low >>= 1;
++low;
}
while ( ( low < high ) && ( *high % 2 != 0 ) ) {
*high >>= 1;
high;
}
// either ( low == high ) or ( *high is even )
if ( low < high ) {
// *low is odd and *high is even
std::swap( *low, *high );
} else {
break;
}
}
std::cerr << std::endl;
print_range( from, to );
std::cerr << std::endl;
// low == high;
if ( *low % 2 == 0 ) {
*low >>= 1;
return( has_dublicates_helper( from, low, max >> 1 )

has_dublicates_helper( high+1, to, max >>1 ) );
} else {
*low >>= 1;
return( has_dublicates_helper( from, low1, max >> 1 )

has_dublicates_helper( high, to, max >>1 ) );
}
}
bool has_dublicates ( Uvector u_vect ) {
// this makes a copy
return( has_dublicates_helper( &u_vect[0], &u_vect[ u_vect.size()1 ],
std::numeric_limits< unsigned int >::max() ) );
}
int main( int argn, char ** args ){
Uvector a;
Uvector b;
for ( int i = 0; i < 10; ++i ) {
a.push_back( i );
b.push_back( i ); b.push_back( i );
}
std::cout << has_dublicates( a )
<< " "
<< has_dublicates( b )
<< "\n";
}
> But it changes the original vector. Can you think of a way to do it
> without changing the original?
Same here, I do not see how to avoid that.
Best
KaiUwe Bux
