Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Byte array compare - Speed

Reply
Thread Tools

Byte array compare - Speed

 
 
MrCoder
Guest
Posts: n/a
 
      06-06-2004
Hey guys, my first post on here so I'll just say "Hello everbody!"

Ok heres my question for you lot.

Is there a faster way to compare 1 byte array to another?

This is my current code

// Check for a match

match = true;

for ( int i = 0; i < arraylen; i++)

{

if( data[i] != data2[i] )

{

match = false;

break;

}

}



Thanks in advance guys!!


 
Reply With Quote
 
 
 
 
JKop
Guest
Posts: n/a
 
      06-06-2004
MrCoder posted:

> Hey guys, my first post on here so I'll just say "Hello everbody!"
>
> Ok heres my question for you lot.
>
> Is there a faster way to compare 1 byte array to another?
>
> This is my current code
>
> // Check for a match
>
> match = true;
>
> for ( int i = 0; i < arraylen; i++)
>
> {
>
> if( data[i] != data2[i] )
>
> {
>
> match = false;
>
> break;
>
> }
>
> }
>
>
>
> Thanks in advance guys!!



If the array is of a variable length, as in your above example, then I would
say that Yes, your code is pretty efficent, although maybe there's some
memory functions I'm unaware of that could do the job better?


If the array is of constant length, I'd suggest the following:

Struct ChocolateArray
{
int monkey[50];
};


int main(void)
{
bool match;

ChocolateArray aa;

ChocolateArray bb;

//Mess with the arrays

if (aa == bb)
{
match = true;
}
else
{
match = false;
}
}


-JKop
 
Reply With Quote
 
 
 
 
ak
Guest
Posts: n/a
 
      06-06-2004
On Sun, 6 Jun 2004 09:23:28 +0000 (UTC), "MrCoder" <(E-Mail Removed)> wrote:

>>Hey guys, my first post on here so I'll just say "Hello everbody!"
>>
>>Ok heres my question for you lot.
>>
>>Is there a faster way to compare 1 byte array to another?
>>
>>This is my current code
>>
>>// Check for a match
>>
>>match = true;
>>
>>for ( int i = 0; i < arraylen; i++)
>>
>>{
>>
>>if( data[i] != data2[i] )
>>
>>{
>>
>>match = false;
>>
>>break;
>>
>>}
>>
>>}
>>
>>
>>
>>Thanks in advance guys!!
>>


memcmp( data, data2, arraylen );

 
Reply With Quote
 
Gianni Mariani
Guest
Posts: n/a
 
      06-06-2004
MrCoder wrote:
> Hey guys, my first post on here so I'll just say "Hello everbody!"
>
> Ok heres my question for you lot.
>
> Is there a faster way to compare 1 byte array to another?
>
> This is my current code
>
> // Check for a match
>
> match = true;
>
> for ( int i = 0; i < arraylen; i++)
>
> {
>
> if( data[i] != data2[i] )
>
> {
>
> match = false;
>
> break;
>
> }
>
> }


On most systems, memcmp may be nicely optimized, it may (on some
platforms) be inlined by the compiler.

i.e.

#include <cstring>
bool isequal( const char * a, const char * b, int len )
{
return ::memcmp( a, b, len ) == 0;
}

using gcc: g++ -mtune=i686 -O3 generates:
memcmp_tst.o: file format elf32-i386

Disassembly of section .text:

00000000 <_Z7isequalPKcS0_i>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
9: 8b 4d 10 mov 0x10(%ebp),%ecx
c: 8b 75 08 mov 0x8(%ebp),%esi
f: 89 7d fc mov %edi,0xfffffffc(%ebp)
12: 8b 7d 0c mov 0xc(%ebp),%edi
15: fc cld
16: 39 c9 cmp %ecx,%ecx
18: f3 a6 repz cmpsb %es%edi),%ds%esi)
1a: 0f 94 c0 sete %al
1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
23: 89 ec mov %ebp,%esp
25: 0f b6 c0 movzbl %al,%eax
28: 5d pop %ebp
29: c3 ret


(Notice no function calls !)

However, on some architectures (Risc - i.e. mips, sparc, hppa etc) there
are a couple of different fast paths and it depends alot on the
processor, cache and usage patterns.

On 64 bit cpu, you might be able to take advantage of reading in large
64 bit chunks at a time. Performance optimizations at this level can get
very involved but unless the profiling indicates that there is a huge
proportion of time spent in memcmp, it's just not worth your time and
effort to optimize it.

memcmp is probably the best choice for a balance of performance and
portability.

 
Reply With Quote
 
Siemel Naran
Guest
Posts: n/a
 
      06-06-2004
"Gianni Mariani" <(E-Mail Removed)> wrote in message news:c9vfo4
> MrCoder wrote:


> > match = true;
> >
> > for ( int i = 0; i < arraylen; i++)
> >
> > {
> >
> > if( data[i] != data2[i] )
> >
> > {
> >
> > match = false;
> >
> > break;
> >
> > }



> #include <cstring>
> bool isequal( const char * a, const char * b, int len )
> {
> return ::memcmp( a, b, len ) == 0;
> }
>
> using gcc: g++ -mtune=i686 -O3 generates:
> memcmp_tst.o: file format elf32-i386
>
> Disassembly of section .text:
>
> 00000000 <_Z7isequalPKcS0_i>:
> 0: 55 push %ebp
> 1: 89 e5 mov %esp,%ebp
> 3: 83 ec 0c sub $0xc,%esp
> 6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
> 9: 8b 4d 10 mov 0x10(%ebp),%ecx
> c: 8b 75 08 mov 0x8(%ebp),%esi
> f: 89 7d fc mov %edi,0xfffffffc(%ebp)
> 12: 8b 7d 0c mov 0xc(%ebp),%edi
> 15: fc cld
> 16: 39 c9 cmp %ecx,%ecx
> 18: f3 a6 repz cmpsb %es%edi),%ds%esi)
> 1a: 0f 94 c0 sete %al
> 1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
> 20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
> 23: 89 ec mov %ebp,%esp
> 25: 0f b6 c0 movzbl %al,%eax
> 28: 5d pop %ebp
> 29: c3 ret


How is the memcmp version different from the original version? Is it
faster? Sorry, I don't know how to read assembly well.


> memcmp is probably the best choice for a balance of performance and
> portability.


An implementation may specialize std::equal(begin, end, begin2) to call
std::memcmp for POD types. I don't know how many actually do though. My
implementation (Borland 6) does not.


 
Reply With Quote
 
MrCoder
Guest
Posts: n/a
 
      06-06-2004
Thanks for the answers guys!

Im going to try it using memcmp shortly and see what sort of speed increase
I get.


 
Reply With Quote
 
Siemel Naran
Guest
Posts: n/a
 
      06-06-2004
"JKop" <(E-Mail Removed)> wrote in message news:7FDwc.1527

> If the array is of constant length, I'd suggest the following:
>
> Struct ChocolateArray
> {
> int monkey[50];
> };
>
>
> int main(void)
> {
> bool match;
>
> ChocolateArray aa;
>
> ChocolateArray bb;
>
> //Mess with the arrays
>
> if (aa == bb)


There is no compiler defined operator== for structs or classes. There's
only operator=, copy constructor, and destructor.


 
Reply With Quote
 
Gianni Mariani
Guest
Posts: n/a
 
      06-06-2004
Siemel Naran wrote:
> "Gianni Mariani" <(E-Mail Removed)> wrote in message news:c9vfo4
>


>
>>#include <cstring>
>>bool isequal( const char * a, const char * b, int len )
>>{
>> return ::memcmp( a, b, len ) == 0;
>>}
>>
>>using gcc: g++ -mtune=i686 -O3 generates:
>>memcmp_tst.o: file format elf32-i386
>>
>>Disassembly of section .text:
>>
>>00000000 <_Z7isequalPKcS0_i>:
>> 0: 55 push %ebp
>> 1: 89 e5 mov %esp,%ebp
>> 3: 83 ec 0c sub $0xc,%esp
>> 6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
>> 9: 8b 4d 10 mov 0x10(%ebp),%ecx
>> c: 8b 75 08 mov 0x8(%ebp),%esi
>> f: 89 7d fc mov %edi,0xfffffffc(%ebp)
>> 12: 8b 7d 0c mov 0xc(%ebp),%edi
>> 15: fc cld
>> 16: 39 c9 cmp %ecx,%ecx
>> 18: f3 a6 repz cmpsb %es%edi),%ds%esi)
>> 1a: 0f 94 c0 sete %al
>> 1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
>> 20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
>> 23: 89 ec mov %ebp,%esp
>> 25: 0f b6 c0 movzbl %al,%eax
>> 28: 5d pop %ebp
>> 29: c3 ret

>
>
> How is the memcmp version different from the original version? Is it
> faster? Sorry, I don't know how to read assembly well.


The disassembly shows that memcmp functionality is inlined - there is no
call to the memcmp function. It's replaced with the ia32 "repz" prefix
to the "cmps" instruction. This is a single instuction that is
equivalent to the entire for loop in the op's example.

isequal() equivalent to isequal2() below.

bool isequal2( const char * a, const char * b, int len )
{

while ( len -- )
{
if ( * (a++) != * (b++) )
{
return false;
}
}
return true;

}

memcmp_tst.o: file format elf32-i386

Disassembly of section .text:

00000000 <_Z8isequal2PKcS0_i>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 56 push %esi
4: 53 push %ebx
5: 8b 75 08 mov 0x8(%ebp),%esi
8: 8b 5d 0c mov 0xc(%ebp),%ebx
b: 8b 4d 10 mov 0x10(%ebp),%ecx
e: eb 0c jmp 1c <_Z8isequal2PKcS0_i+0x1c>
10: 0f b6 13 movzbl (%ebx),%edx
13: 43 inc %ebx
14: 0f b6 06 movzbl (%esi),%eax
17: 46 inc %esi
18: 38 d0 cmp %dl,%al
1a: 75 0f jne 2b <_Z8isequal2PKcS0_i+0x2b>
1c: 49 dec %ecx
1d: 83 f9 ff cmp $0xffffffff,%ecx
20: 75 ee jne 10 <_Z8isequal2PKcS0_i+0x10>
22: 5b pop %ebx
23: b8 01 00 00 00 mov $0x1,%eax
28: 5e pop %esi
29: 5d pop %ebp
2a: c3 ret
2b: 31 c0 xor %eax,%eax
2d: 5b pop %ebx
2e: 5e pop %esi
2f: 5d pop %ebp
30: c3 ret

As you can see, it has 2 conditional jumps "jne = jump if not equal".
Conditional jumps are costly in modern CPU's that do branch prediction
when the prediction is wrong. If the compiler was really really smart,
it *might* be able to figure out how to do the optimization to use "repz
cmp", but I have yet to see one !

>
>>memcmp is probably the best choice for a balance of performance and
>>portability.

>
>
> An implementation may specialize std::equal(begin, end, begin2) to call
> std::memcmp for POD types. I don't know how many actually do though. My
> implementation (Borland 6) does not.


It appears that gcc also does not. Using std::equal

00000040 <_Z8isequal3PKcS0_i>:
40: 55 push %ebp
41: 89 e5 mov %esp,%ebp
43: 53 push %ebx
44: 8b 55 08 mov 0x8(%ebp),%edx
47: 8b 45 10 mov 0x10(%ebp),%eax
4a: 8b 4d 0c mov 0xc(%ebp),%ecx
4d: 89 d3 mov %edx,%ebx
4f: 01 c3 add %eax,%ebx
51: eb 09 jmp 5c <_Z8isequal3PKcS0_i+0x1c>
53: 0f b6 01 movzbl (%ecx),%eax
56: 38 02 cmp %al,(%edx)
58: 75 11 jne 6b <_Z8isequal3PKcS0_i+0x2b>
5a: 42 inc %edx
5b: 41 inc %ecx
5c: 39 da cmp %ebx,%edx
5e: 75 f3 jne 53 <_Z8isequal3PKcS0_i+0x13>
60: 5b pop %ebx
61: b8 01 00 00 00 mov $0x1,%eax
66: 0f b6 c0 movzbl %al,%eax
69: 5d pop %ebp
6a: c3 ret
6b: 5b pop %ebx
6c: 31 c0 xor %eax,%eax
6e: 0f b6 c0 movzbl %al,%eax
71: 5d pop %ebp
72: c3 ret


Note the 2 jne's - basically the same as the previous loop.
 
Reply With Quote
 
Paul Mensonides
Guest
Posts: n/a
 
      06-07-2004
MrCoder wrote:
> Thanks for the answers guys!
>
> Im going to try it using memcmp shortly and see what sort of speed
> increase I get.


When you know the size, you can also unroll (or partially unroll) the loop.
This is a semi-generalized implementation:

#include <functional>
#include <iterator>

// identity

template<class T> struct identity {
typedef T type;
};

// map_integral

template<class T, T V> struct map_integral {
static const T value = V;
};

template<class T, T V> const T map_integral<T, V>::value;

// enable_if

template<bool, class R> struct enable_if { };
template<class R> struct enable_if<true, R> : identity<R> { };

// is_class

template<class T> class is_class {
private:
template<class U> static char check(int U::*);
template<class U> static char (& check(...))[2];
public:
static const bool value = sizeof(check<T>(0)) == 1;
};

template<class T> const bool is_class<T>::value;

// is_same

template<class T, class U> struct is_same : map_integral<bool, false> { };
template<class T> struct is_same<T, T> : map_integral<bool, true> { };

// is_pointer_to_function

template<class> struct is_pointer_to_function
: map_integral<bool, false> { };

template<> struct is_pointer_to_function<void*>
: map_integral<bool, false> { };

template<class T> struct is_pointer_to_function<T*>
: is_same<void (T), void (T*)> { };

// is_reference_to_function

template<class> struct is_reference_to_function
: map_integral<bool, false> { };

template<class T> struct is_reference_to_function<T&>
: is_same<void (T), void (T*)> { };

// compare

namespace detail {

template<class iterator, class predicate>
bool compare(
iterator a1, iterator a2,
iterator b1, iterator b2,
predicate pred, std::input_iterator_tag
) {
while (a1 != a2) {
if (b1 == b2 || !pred(*a1++, *b1++)) {
return false;
}
}
return b1 == b2;
}

template<class iterator, class predicate>
bool compare(
iterator a1, iterator a2,
iterator b1, iterator b2,
predicate pred, std::random_access_iterator_tag
) {
typedef typename std::iterator_traits<iterator>::difference_type
difference_type;
difference_type size(a2 - a1);
if (size != b2 - b1) {
return false;
}
// unrolling factor == 4
register difference_type n((size + 3) / 4);
#define body() \
if (!pred(*a1++, *b1++)) { \
return false; \
} \
/**/
switch (size % 4 - !size) {
case 0:
do {
body()
case 3: body()
case 2: body()
case 1: body()
} while (--n);
}
#undef body
return true;
}

} // detail

template<class iterator, class predicate>
inline bool compare(
iterator a1, iterator a2,
iterator b1, iterator b2,
predicate pred
) {
return detail::compare(
a1, a2, b1, b2,
pred, typename std::iterator_traits<iterator>::iterator_category( )
);
}

template<class iterator>
inline bool compare(iterator a1, iterator a2, iterator b1, iterator b2) {
return compare(
a1, a2, b1, b2,
std::equal_to<typename std::iterator_traits<iterator>::value_type>()
);
}

template<class T, class predicate>
inline bool compare(const T* a, const T* b, unsigned size, predicate pred) {
return compare(a, a + size, b, b + size, pred);
}

template<class T, unsigned long s1, unsigned long s2, class predicate>
inline typename enable_if<
is_class<predicate>::value
|| is_pointer_to_function<predicate>::value
|| is_reference_to_function<predicate>::value,
bool
>::type compare(T (& a)[s1], T (& b)[s2], predicate pred) {

return compare(a, a + s1, b, b + s2, pred);
}

template<class T> inline bool compare(const T* a, const T* b, unsigned size) {
return compare(a, b, size, std::equal_to<const T>());
}

template<class T, unsigned long s1, unsigned long s2>
inline bool compare(T (& a)[s1], T (& b)[s2]) {
return compare(a, b, std::equal_to<T>());
}

Regards,
Paul Mensonides


 
Reply With Quote
 
Siemel Naran
Guest
Posts: n/a
 
      06-07-2004
"Gianni Mariani" <(E-Mail Removed)> wrote in message news:c9vr41
> Siemel Naran wrote:


> > How is the memcmp version different from the original version? Is it
> > faster? Sorry, I don't know how to read assembly well.

>
> The disassembly shows that memcmp functionality is inlined - there is no
> call to the memcmp function. It's replaced with the ia32 "repz" prefix
> to the "cmps" instruction. This is a single instuction that is
> equivalent to the entire for loop in the op's example.
>
> isequal() equivalent to isequal2() below.
>
> bool isequal2( const char * a, const char * b, int len )
> {
>
> while ( len -- )
> {
> if ( * (a++) != * (b++) )
> {
> return false;
> }
> }
> return true;
>
> }
>
> memcmp_tst.o: file format elf32-i386
>
> Disassembly of section .text:
>
> 00000000 <_Z8isequal2PKcS0_i>:
> 0: 55 push %ebp
> 1: 89 e5 mov %esp,%ebp
> 3: 56 push %esi
> 4: 53 push %ebx
> 5: 8b 75 08 mov 0x8(%ebp),%esi
> 8: 8b 5d 0c mov 0xc(%ebp),%ebx
> b: 8b 4d 10 mov 0x10(%ebp),%ecx
> e: eb 0c jmp 1c <_Z8isequal2PKcS0_i+0x1c>
> 10: 0f b6 13 movzbl (%ebx),%edx
> 13: 43 inc %ebx
> 14: 0f b6 06 movzbl (%esi),%eax
> 17: 46 inc %esi
> 18: 38 d0 cmp %dl,%al
> 1a: 75 0f jne 2b <_Z8isequal2PKcS0_i+0x2b>
> 1c: 49 dec %ecx
> 1d: 83 f9 ff cmp $0xffffffff,%ecx
> 20: 75 ee jne 10 <_Z8isequal2PKcS0_i+0x10>
> 22: 5b pop %ebx
> 23: b8 01 00 00 00 mov $0x1,%eax
> 28: 5e pop %esi
> 29: 5d pop %ebp
> 2a: c3 ret
> 2b: 31 c0 xor %eax,%eax
> 2d: 5b pop %ebx
> 2e: 5e pop %esi
> 2f: 5d pop %ebp
> 30: c3 ret
>
> As you can see, it has 2 conditional jumps "jne = jump if not equal".
> Conditional jumps are costly in modern CPU's that do branch prediction
> when the prediction is wrong. If the compiler was really really smart,
> it *might* be able to figure out how to do the optimization to use "repz
> cmp", but I have yet to see one !
>
> >
> >>memcmp is probably the best choice for a balance of performance and
> >>portability.

> >
> >
> > An implementation may specialize std::equal(begin, end, begin2) to call
> > std::memcmp for POD types. I don't know how many actually do though.

My
> > implementation (Borland 6) does not.

>
> It appears that gcc also does not. Using std::equal
>
> 00000040 <_Z8isequal3PKcS0_i>:
> 40: 55 push %ebp
> 41: 89 e5 mov %esp,%ebp
> 43: 53 push %ebx
> 44: 8b 55 08 mov 0x8(%ebp),%edx
> 47: 8b 45 10 mov 0x10(%ebp),%eax
> 4a: 8b 4d 0c mov 0xc(%ebp),%ecx
> 4d: 89 d3 mov %edx,%ebx
> 4f: 01 c3 add %eax,%ebx
> 51: eb 09 jmp 5c <_Z8isequal3PKcS0_i+0x1c>
> 53: 0f b6 01 movzbl (%ecx),%eax
> 56: 38 02 cmp %al,(%edx)
> 58: 75 11 jne 6b <_Z8isequal3PKcS0_i+0x2b>
> 5a: 42 inc %edx
> 5b: 41 inc %ecx
> 5c: 39 da cmp %ebx,%edx
> 5e: 75 f3 jne 53 <_Z8isequal3PKcS0_i+0x13>
> 60: 5b pop %ebx
> 61: b8 01 00 00 00 mov $0x1,%eax
> 66: 0f b6 c0 movzbl %al,%eax
> 69: 5d pop %ebp
> 6a: c3 ret
> 6b: 5b pop %ebx
> 6c: 31 c0 xor %eax,%eax
> 6e: 0f b6 c0 movzbl %al,%eax
> 71: 5d pop %ebp
> 72: c3 ret
>
>
> Note the 2 jne's - basically the same as the previous loop.


Thanks for your answers!


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert form byte[4] to Int32 while retaining the binary value of the byte array jeff@foundrymusic.com C++ 20 09-07-2009 08:54 PM
Re: how to extend a byte[] array with a null byte? Tom McGlynn Java 4 04-18-2008 11:49 PM
Re: how to extend a byte[] array with a null byte? Patricia Shanahan Java 0 04-17-2008 06:47 PM
Converting a Primative byte array to a Byte array object Kirby Java 3 10-08-2004 03:01 AM
Appending byte[] to another byte[] array Bharat Bhushan Java 15 08-05-2003 07:52 PM



Advertisments