Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Hamming Distance

Reply
Thread Tools

Hamming Distance

 
 
godavemon
Guest
Posts: n/a
 
      06-19-2008
I need to calculate the Hamming Distance of two integers. The hamming
distance is the number of bits in two integers that don't match. I
thought there'd be a function in math or scipy but i haven't been able
to find one. This is my function but it seems like there should be a
faster way. I do this computation many times and speed up is
important.

def hamdist( a, b , bits = 32):
def _hamdist( x, bits):
if bits:
return (x & 1) + _hamdist(x >> 1, bits-1)
return x & 1
return _hamdist( a ^ b, bits)


Another alternative would be to convert the XOR to a binary string and
count the # of 1's.

Which would be fastest? Are there better alternatives?

Thanks!
 
Reply With Quote
 
 
 
 
Raymond Hettinger
Guest
Posts: n/a
 
      06-19-2008
On Jun 19, 4:27*pm, godavemon <(E-Mail Removed)> wrote:
> I need to calculate the Hamming Distance of two integers. *The hamming
> distance is the number of bits in two integers that don't match. *I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. *This is my function but it seems like there should be a
> faster way. *I do this computation many times and speed up is
> important.


The simplest speed-up is to break the inputs into n-length blocks and
then look them up in an n-by-n table.

Also, re-write your function to use iteration instead of recursion
(the latter is *very* expensive in Python).

The fastest way is to write a C function or use Pyrex.


Raymond
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      06-19-2008
On Jun 20, 9:27 am, godavemon <(E-Mail Removed)> wrote:
> I need to calculate the Hamming Distance of two integers. The hamming
> distance is the number of bits in two integers that don't match. I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. This is my function but it seems like there should be a
> faster way. I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
> def _hamdist( x, bits):
> if bits:
> return (x & 1) + _hamdist(x >> 1, bits-1)
> return x & 1
> return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and


Isn't it a "binary string" already?

> count the # of 1's.


My guess is that counting the 1-bits in (a ^ b) would be hard to beat,
and that a recursive function is just not in the race.

>
> Which would be fastest?


Implement both and time them.

> Are there better alternatives?


I doubt it.

BTW one presumes that your integers are non-negative ...

HTH

Cheers,
John
 
Reply With Quote
 
Mensanator
Guest
Posts: n/a
 
      06-20-2008
On Jun 19, 6:27*pm, godavemon <(E-Mail Removed)> wrote:
> I need to calculate the Hamming Distance of two integers. *The hamming
> distance is the number of bits in two integers that don't match. *I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. *This is my function but it seems like there should be a
> faster way. *I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
> * * def _hamdist( x, bits):
> * * * * if bits:
> * * * * * * return (x & 1) + _hamdist(x >> 1, bits-1)
> * * * * return x & 1
> * * return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest? *Are there better alternatives?


Yep, use the gmpy module.

>>> import gmpy
>>> help(gmpy.hamdist)

Help on built-in function hamdist in module gmpy:
hamdist(...)
hamdist(x,y): returns the Hamming distance (number of bit-
positions
where the bits differ) between x and y. x and y must be mpz, or
else
get coerced to mpz.

>>> gmpy.hamdist(15,6)

2
>>> gmpy.hamdist(2**177149,10**53330)

61877
>>> gmpy.hamdist(2**177149-1,10**53330)

115285


>
> Thanks!


 
Reply With Quote
 
Robert Kern
Guest
Posts: n/a
 
      06-20-2008
godavemon wrote:
> I need to calculate the Hamming Distance of two integers. The hamming
> distance is the number of bits in two integers that don't match. I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. This is my function but it seems like there should be a
> faster way. I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
> def _hamdist( x, bits):
> if bits:
> return (x & 1) + _hamdist(x >> 1, bits-1)
> return x & 1
> return _hamdist( a ^ b, bits)
>
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest? Are there better alternatives?


I think this works:

In [26]: hexbits = {'0': 0,
....: '1': 1,
....: '2': 1,
....: '3': 2,
....: '4': 1,
....: '5': 2,
....: '6': 2,
....: '7': 3,
....: '8': 1,
....: '9': 2,
....: 'A': 2,
....: 'B': 3,
....: 'C': 2,
....: 'D': 3,
....: 'E': 3,
....: 'F': 4}

In [28]: def hamming(a, b):
....: return sum(hexbits[h] for h in hex(a^b)[2:])
....:

In [29]: hamming(1,1)
Out[29]: 0

In [30]: hamming(1,2)
Out[30]: 2

In [31]: hamming(2,2)
Out[31]: 0

In [32]: hamming(2,3)
Out[32]: 1


This might be a wee faster, I haven't timed it:

sum(map(h.get, hex(a^b)[2:]))

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
Matimus
Guest
Posts: n/a
 
      06-20-2008
On Jun 19, 4:27*pm, godavemon <(E-Mail Removed)> wrote:
> I need to calculate the Hamming Distance of two integers. *The hamming
> distance is the number of bits in two integers that don't match. *I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. *This is my function but it seems like there should be a
> faster way. *I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
> * * def _hamdist( x, bits):
> * * * * if bits:
> * * * * * * return (x & 1) + _hamdist(x >> 1, bits-1)
> * * * * return x & 1
> * * return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest? *Are there better alternatives?
>
> Thanks!


I see no good reason to use recursion for this type of thing. Here are
some of my attempts:

Code:
from math import log

def yours(a, b , bits = 32):
     def _hamdist( x, bits):
         if bits:
             return (x & 1) + _hamdist(x >> 1, bits-1)
         return x & 1
     return _hamdist(a ^ b, bits)


def simple(a, b, bits=32):
    x = a ^ b
    return sum((x >> i & 1) for i in xrange(bits))

def lazy(a, b, bits=32):
    x = (a ^ b) & ((1 << bits) - 1)
    tot = 0
    while x:
        tot += x & 1
        x >>= 1
    return tot

def fancy(a, b, bits=32):
    x = (a ^ b) & ((1 << bits) - 1)
    tot = 0
    while x:
        tot += 1
        x ^= 1 << int(log(x, 2))
    return tot

test_vals = (
        ((0xffffffff, 0), 32),
        ((0,0), 0),
        ((1,0), 1),
        ((0x80000000, 0), 1),
        ((0x55555555, 0), 16)
        )

def test(f):
    test_vals = (
            ((0xffffffff, 0), 32), # ALL
            ((0,0), 0), # None
            ((1,0), 1), # First
            ((0x80000000, 0), 1), # Last
            ((0x55555555, 0), 16), # Every Other
            ((0xffff, 0), 16), # First Half
            ((0xffff0000, 0), 16), # Last Half
            )
    for i, (args, exp) in enumerate(test_vals):
        if f(*args) != exp:
            return 0
    return 1

if __name__ == "__main__":
    for f in (yours, simple, lazy, fancy):
        if not test(f):
            print "%s failed"%f.__name__
The python module `timeit` is handy for testing speed:

python -mtimeit -s"from hamdist import *" "test(yours)"
10000 loops, best of 3: 95.1 usec per loop

python -mtimeit -s"from hamdist import *" "test(simple)"
10000 loops, best of 3: 65.3 usec per loop

python -mtimeit -s"from hamdist import *" "test(lazy)"
10000 loops, best of 3: 59.8 usec per loop

python -mtimeit -s"from hamdist import *" "test(fancy)"
10000 loops, best of 3: 77.2 usec per loop

Even the ridiculous `fancy` version beat the recursive version.

Matt
 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      06-20-2008
Non-recursive, 8-bit block table lookup version:

def ham(a, b, ht=[hamdist(a,0) for a in range(256)]):
x = a ^ b
dist = 0
while x:
dist += ht[x & 255]
x >>= 8
return dist

Raymond


 
Reply With Quote
 
godavemon
Guest
Posts: n/a
 
      06-20-2008
Awesome! Thanks a lot.

On Jun 19, 5:00 pm, Mensanator <(E-Mail Removed)> wrote:
> On Jun 19, 6:27 pm, godavemon <(E-Mail Removed)> wrote:
>
>
>
> > I need to calculate the Hamming Distance of two integers. The hamming
> > distance is the number of bits in two integers that don't match. I
> > thought there'd be a function in math or scipy but i haven't been able
> > to find one. This is my function but it seems like there should be a
> > faster way. I do this computation many times and speed up is
> > important.

>
> > def hamdist( a, b , bits = 32):
> > def _hamdist( x, bits):
> > if bits:
> > return (x & 1) + _hamdist(x >> 1, bits-1)
> > return x & 1
> > return _hamdist( a ^ b, bits)

>
> > Another alternative would be to convert the XOR to a binary string and
> > count the # of 1's.

>
> > Which would be fastest? Are there better alternatives?

>
> Yep, use the gmpy module.
>
> >>> import gmpy
> >>> help(gmpy.hamdist)

>
> Help on built-in function hamdist in module gmpy:
> hamdist(...)
> hamdist(x,y): returns the Hamming distance (number of bit-
> positions
> where the bits differ) between x and y. x and y must be mpz, or
> else
> get coerced to mpz.
>
> >>> gmpy.hamdist(15,6)

> 2
> >>> gmpy.hamdist(2**177149,10**53330)

> 61877
> >>> gmpy.hamdist(2**177149-1,10**53330)

>
> 115285
>
>
>
> > Thanks!




 
Reply With Quote
 
godavemon
Guest
Posts: n/a
 
      06-20-2008
Great thanks!

On Jun 19, 5:37 pm, Matimus <(E-Mail Removed)> wrote:
> On Jun 19, 4:27 pm, godavemon <(E-Mail Removed)> wrote:
>
>
>
> > I need to calculate the Hamming Distance of two integers. The hamming
> > distance is the number of bits in two integers that don't match. I
> > thought there'd be a function in math or scipy but i haven't been able
> > to find one. This is my function but it seems like there should be a
> > faster way. I do this computation many times and speed up is
> > important.

>
> > def hamdist( a, b , bits = 32):
> > def _hamdist( x, bits):
> > if bits:
> > return (x & 1) + _hamdist(x >> 1, bits-1)
> > return x & 1
> > return _hamdist( a ^ b, bits)

>
> > Another alternative would be to convert the XOR to a binary string and
> > count the # of 1's.

>
> > Which would be fastest? Are there better alternatives?

>
> > Thanks!

>
> I see no good reason to use recursion for this type of thing. Here are
> some of my attempts:
>
>
Code:
> from math import log
>
> def yours(a, b , bits = 32):
>      def _hamdist( x, bits):
>          if bits:
>              return (x & 1) + _hamdist(x >> 1, bits-1)
>          return x & 1
>      return _hamdist(a ^ b, bits)
>
> def simple(a, b, bits=32):
>     x = a ^ b
>     return sum((x >> i & 1) for i in xrange(bits))
>
> def lazy(a, b, bits=32):
>     x = (a ^ b) & ((1 << bits) - 1)
>     tot = 0
>     while x:
>         tot += x & 1
>         x >>= 1
>     return tot
>
> def fancy(a, b, bits=32):
>     x = (a ^ b) & ((1 << bits) - 1)
>     tot = 0
>     while x:
>         tot += 1
>         x ^= 1 << int(log(x, 2))
>     return tot
>
> test_vals = (
>         ((0xffffffff, 0), 32),
>         ((0,0), 0),
>         ((1,0), 1),
>         ((0x80000000, 0), 1),
>         ((0x55555555, 0), 16)
>         )
>
> def test(f):
>     test_vals = (
>             ((0xffffffff, 0), 32), # ALL
>             ((0,0), 0), # None
>             ((1,0), 1), # First
>             ((0x80000000, 0), 1), # Last
>             ((0x55555555, 0), 16), # Every Other
>             ((0xffff, 0), 16), # First Half
>             ((0xffff0000, 0), 16), # Last Half
>             )
>     for i, (args, exp) in enumerate(test_vals):
>         if f(*args) != exp:
>             return 0
>     return 1
>
> if __name__ == "__main__":
>     for f in (yours, simple, lazy, fancy):
>         if not test(f):
>             print "%s failed"%f.__name__
>
>
> The python module `timeit` is handy for testing speed:
>
> python -mtimeit -s"from hamdist import *" "test(yours)"
> 10000 loops, best of 3: 95.1 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(simple)"
> 10000 loops, best of 3: 65.3 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(lazy)"
> 10000 loops, best of 3: 59.8 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(fancy)"
> 10000 loops, best of 3: 77.2 usec per loop
>
> Even the ridiculous `fancy` version beat the recursive version.
>
> Matt




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hamming Distance Martin Hansen Ruby 4 03-08-2011 05:40 AM
Hamming distance Niv VHDL 9 01-19-2011 10:40 PM
Hamming Distance avoidingspam2001@yahoo.com Java 7 03-17-2010 06:17 AM
C Unleashed Chapter 18: Hamming Codes jaysome C Programming 2 12-18-2007 08:25 AM
Low Pass Hamming filter design LabWINC Python 2 03-31-2006 07:31 AM



Advertisments