Velocity Reviews > Number set type

# Number set type

Heiko Wundram
Guest
Posts: n/a

 12-29-2005
Hi all!

I'm wondering whether there is some form of number set type implemented in
pure Python. I'm currently in the process of implementing one myself (for
an IPv4 address range type), and if there's some form of reference
implementation I'd love to see it. I've not found a reasonably complete
implementation anywhere so far.

To explain what I mean with number set: a set type that doesn't store
individual entries but as it only contains numbers can store ranges of
numbers.

Basically what it boils down to is implementing intersection reasonably
fast, because all other set operations can be reduced to intersection and
union (which is easy to implement). I have implementations for both, but
only union is fast with O(n), intersection is currently O(n^2).

But, anyway, if anybody knows of some implementation I can have a look at,
please let me know.

Thanks!

--- Heiko.

Justin Azoff
Guest
Posts: n/a

 12-29-2005
You could use IPy...
http://svn.23.nu/svn/repos/IPy/trunk/IPy.py is one location for it...

I wonder where you get O(n) and O(n^2) from... CIDR blocks are all
sequential.. All you need to store is the starting and ending address
or length. Then any set operation only has to deal with 4 numbers, and
should be literally a few lines of code with no loops.

Heiko Wundram
Guest
Posts: n/a

 12-29-2005
Justin Azoff wrote:
> You could use IPy...
> http://svn.23.nu/svn/repos/IPy/trunk/IPy.py is one location for it...

I know about IPy... But: it simply doesn't implement all I need it for.

> I wonder where you get O(n) and O(n^2) from... CIDR blocks are all
> sequential.. All you need to store is the starting and ending address
> or length. Then any set operation only has to deal with 4 numbers, and
> should be literally a few lines of code with no loops.

You make assumptions that my usage simply doesn't fill. An IP4Range must
consist of more than a single block of IP addresses (like 192.168.0.0/24
_and_ 192.168.10.0/24), that's why it's an IP4Range and not a IP4Network
(which I implement as an addr/mask pair in my IP4 abstraction).

An IP4Range can consist of several different ranges (which are of course
normalized and disjunct). I can specify what I meant by O(n) and O(n^2):

Union of two IP4Ranges is simply normalizing a concatenated list of both
IP4Range ranges. Normalizing takes O(log n)+O(n) = O(n) steps, where n is
the number of ranges in the combined IP4Range.

Intersection takes O(n^2) steps in my current implementation (which I know
is mathematically correct), where n is max(n_1,n_2) where n_1 is the number
of ranges in the first IP4Range and n_2 the number of ranges in the second
IP4Range respectively.

Intersecting two IP4Ranges can be done with fewer steps, and I think it
could be done in O(n) in the case of normalized and sorted ranges, and I
have a few ideas of myself, but I'm currently too lazy to try to prove them
correct.

Just as intersection and union can be done more efficiently, I guess that
several set operations can be done the simple way (like
self.difference(other) <=> self.union(other.inverse()), whereby inverse is
O(n) too), but still maybe there are better algorithms to do it directly.

Anyway, that's why I wanted to have a look at a ready implementation of a
number set type so that I might get a glimpse at somebody elses
implementation.

--- Heiko.

Heiko Wundram
Guest
Posts: n/a

 12-29-2005
Correction:

Heiko Wundram wrote:
> You make assumptions that my usage simply doesn't fill. An IP4Range must

be able to

> consist of more than a single block of IP addresses (like 192.168.0.0/24
> _and_ 192.168.10.0/24), that's why it's an IP4Range and not a IP4Network
> (which I implement as an addr/mask pair in my IP4 abstraction).

--- Heiko.

Justin Azoff
Guest
Posts: n/a

 12-29-2005
Heiko Wundram wrote:
> Union of two IP4Ranges is simply normalizing a concatenated list of both
> IP4Range ranges. Normalizing takes O(log n)+O(n) = O(n) steps, where n is
> the number of ranges in the combined IP4Range.

I see now If the ranges are sorted, I bet you could just iterate
through both at the same time, merging intersecting ranges where
possible.

> Intersection takes O(n^2) steps in my current implementation (which I know
> is mathematically correct), where n is max(n_1,n_2) where n_1 is the number
> of ranges in the first IP4Range and n_2 the number of ranges in the second
> IP4Range respectively.
>
> Intersecting two IP4Ranges can be done with fewer steps, and I think it
> could be done in O(n) in the case of normalized and sorted ranges, and I
> have a few ideas of myself, but I'm currently too lazy to try to prove them
> correct.

Yes.. if they are sorted, something like this should work:

def intersection(self, other):
ret = []
ai=iter(self.ranges)
bi=iter(other.ranges)
try :
a = ai.next()
b = bi.next()
except StopIteration:
return IP4Range([])

while 1:
try :
if a.intersects(b):
ret.append(a.intersection(b))
a = ai.next()
b = bi.next()
elif a.start < b.start:
a = ai.next()
else :
b = bi.next()
except StopIteration:
break
return IP4Range(ret)

--
- Justin

Justin Azoff
Guest
Posts: n/a

 12-29-2005
Justin Azoff wrote:
> Yes.. if they are sorted, something like this should work:

Oops, that was almost right, but it would skip some ranges.

This should always work:

....
while 1:
try :
if a.intersects(b):
ret.append(a.intersection(b))
if a.end < b.end:
a = ai.next()
else :
b = bi.next()
elif a.start < b.start:
a = ai.next()
else :
b = bi.next()
except StopIteration:
break
return RangeList(ret)

--
- Justin