Velocity Reviews > nearest neighbor in 2D

# nearest neighbor in 2D

John Hunter
Guest
Posts: n/a

 11-03-2003

I have a list of two tuples containing x and y coord

(x0, y0)
(x1, y1)
...
(xn, yn)

Given a new point x,y, I would like to find the point in the list
closest to x,y. I have to do this a lot, in an inner loop, and then I
add each new point x,y to the list. I know the range of x and y in

One solution that comes to mind is to partition to space into
comes in, identify it's quadrant and only search the appropriate
quadrant for nearest neighbor. This could be done recursively, a 2D
binary search of sorts....

Can anyone point me to some code or module that provides the
appropriate data structures and algorithms to handle this task
efficiently? The size of the list will likely be in the range of
10-1000 elements.

Thanks,
John Hunter

Isaac To
Guest
Posts: n/a

 11-03-2003
>>>>> "John" == John Hunter <(E-Mail Removed)> writes:

John> Given a new point x,y, I would like to find the point in the list
John> closest to x,y. I have to do this a lot, in an inner loop, and
John> then I add each new point x,y to the list. I know the range of x

John> One solution that comes to mind is to partition to space into
John> quadrants and store the elements by quadrant. When a new element
John> comes in, identify it's quadrant and only search the appropriate
John> quadrant for nearest neighbor. This could be done recursively, a
John> 2D binary search of sorts....

By recursion your solution would work in O(log n) time. The construction
would take O(n log n) time. Unluckily, it can return the wrong point, as
the nearest point within the nearest quadrant might not be the nearest
point.

The problem is a well-studied basic computational geometry problem, although
I don't really know any Python code that actually do it. Try to look at the
web for "Voronoi diagrams" and "radial triangulation" to understand how to
solve it properly in the above mentioned (perhaps randomized) time
complexity.

Regards,
Isaac.

Noen
Guest
Posts: n/a

 11-03-2003
John Hunter wrote:

> I have a list of two tuples containing x and y coord
>
> (x0, y0)
> (x1, y1)
> ...
> (xn, yn)
>
> Given a new point x,y, I would like to find the point in the list
> closest to x,y. I have to do this a lot, in an inner loop, and then I
> add each new point x,y to the list. I know the range of x and y in
>
> One solution that comes to mind is to partition to space into
> quadrants and store the elements by quadrant. When a new element
> comes in, identify it's quadrant and only search the appropriate
> quadrant for nearest neighbor. This could be done recursively, a 2D
> binary search of sorts....
>
> Can anyone point me to some code or module that provides the
> appropriate data structures and algorithms to handle this task
> efficiently? The size of the list will likely be in the range of
> 10-1000 elements.
>
> Thanks,
> John Hunter
>
>

You could to a for loop, and inside that loop you will have a variable
lessest_distance. I dont know much geometric mathematics, but Im pretty
sure you can use pytagoras stuff to find the lenght from (Xn,Yn) to
(X,Y) using sinus cosinus and such.

And when the function is finished, you should return lessest_distance

Graham Lee
Guest
Posts: n/a

 11-03-2003
John Hunter wrote:

>
> One solution that comes to mind is to partition to space into
> quadrants and store the elements by quadrant. When a new element
> comes in, identify it's quadrant and only search the appropriate
> quadrant for nearest neighbor. This could be done recursively, a 2D
> binary search of sorts....
>

What happens when you put a particle in near/at the boundary of a quadrant
though? It's possible for the nearest neighbour to be in the nearest
neighbour quadrant...although you could search over these as well.
However, the number of independent areas implied by use of the word
'quadrant' suggests that this would be the same as iterating over all
space....
--
Graham Lee
Oxford

Guest
Posts: n/a

 11-04-2003
On Sun, 02 Nov 2003 22:12:47 -0600, John Hunter
<(E-Mail Removed)> wrote:

>
>I have a list of two tuples containing x and y coord
>
> (x0, y0)
> (x1, y1)
> ...
> (xn, yn)
>
>Given a new point x,y, I would like to find the point in the list
>closest to x,y. I have to do this a lot, in an inner loop, and then I
>add each new point x,y to the list. I know the range of x and y in
>
>One solution that comes to mind is to partition to space into
>comes in, identify it's quadrant and only search the appropriate
>quadrant for nearest neighbor. This could be done recursively, a 2D
>binary search of sorts....
>
>Can anyone point me to some code or module that provides the
>appropriate data structures and algorithms to handle this task
>efficiently? The size of the list will likely be in the range of
>10-1000 elements.
>
>Thanks,
>John Hunter
>

This is how I would do it. Maybe it's how you are already doing it?

import math
import random

n_points = 1000
max_x = 1000
max_y = 1000
closest_distance = 10000
closest_point = (max_x,max_y)

p = []
for i in xrange(n_points):
x = round(max_x*random.random())
y = round(max_y*random.random())
p.append((x, y))

new_point = (round(max_x*random.random()), \
round(max_y*random.random()))

for point in p:
distance = math.sqrt((new_point[0]-point[0])**2 \
+(new_point[1]-point[1])**2)
if distance < closest_distance:
closest_distance = distance
closest_point = point

print 'new_point:', new_point
print 'closest_point:', closest_point,' \
out of',n_points,'points.'

I really don't know how you can make this faster. There might be a
library that has a distance between two points function that could
speed it up.

http://www.velocityreviews.com/forums/(E-Mail Removed)

David Eppstein
Guest
Posts: n/a

 11-04-2003
> >I have a list of two tuples containing x and y coord
> >
> > (x0, y0)
> > (x1, y1)
> > ...
> > (xn, yn)
> >
> >Given a new point x,y, I would like to find the point in the list
> >closest to x,y. I have to do this a lot, in an inner loop, and then I
> >add each new point x,y to the list. I know the range of x and y in

Here's some not-very-heavily-tested code for doing this using a kD-tree.

Worst case efficiency is still linear per point or quadratic total
(unlike some other more sophisticated data structures) but in practice
if your points are reasonably well behaved this should be pretty good;
e.g. I tried it with 10000 random points (each queried then added) and
it made only 302144 recursive calls to nearestNeighbor.

Also note that only the test code at the end restricts to two
dimensions, everything else works in arbitrary numbers of dimensions.

def dist2(p,q):
"""Squared distance between p and q."""
d = 0
for i in range(len(p)):
d += (p[i]-q[i])**2
return d

class kdtree:
def __init__(self,dim=2,index=0):
self.dim = dim
self.index = index
self.split = None

"""Include another point in the kD-tree."""
if self.split is None:
self.split = p
self.left = kdtree(self.dim, (self.index + 1) % self.dim)
self.right = kdtree(self.dim, (self.index + 1) % self.dim)
elif self.split[self.index] < p[self.index]:
else:

def nearestNeighbor(self,q,maxdist2):
"""Find pair (d,p) where p is nearest neighbor and d is squared
distance to p. Returned distance must be within maxdist2; if
not, no point itself is returned.
"""
solution = (maxdist2+1,None)
if self.split is not None:
solution = min(solution, (dist2(self.split,q),self.split))
d2split = (self.split[self.index] - q[self.index])**2
if self.split[self.index] < p[self.index]:
solution = min(solution,
self.left.nearestNeighbor(q,solution[0]))
if d2split < solution[0]:
solution = min(solution,
self.right.nearestNeighbor(q,solution[0]))
else:
solution = min(solution,
self.right.nearestNeighbor(q,solution[0]))
if d2split < solution[0]:
solution = min(solution,
self.left.nearestNeighbor(q,solution[0]))
return solution

if __name__ == "__main__":
import math
import random

n_points = 50
max_x = 1000
max_y = 1000
max_dist2 = max_x**2 + max_y**2

k = kdtree()
for i in range(n_points):
x = round(max_x*random.random())
y = round(max_y*random.random())
p = (x,y)

if i == 0:
print 'new point',p
else:
d,q = k.nearestNeighbor(p,max_dist2)
print 'new point', p, 'has neighbor',
print q, 'at distance', math.sqrt(d)

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Bengt Richter
Guest
Posts: n/a

 11-04-2003
On Sun, 02 Nov 2003 22:12:47 -0600, John Hunter <(E-Mail Removed)> wrote:

>
>I have a list of two tuples containing x and y coord
>
> (x0, y0)
> (x1, y1)
> ...
> (xn, yn)
>
>Given a new point x,y, I would like to find the point in the list
>closest to x,y. I have to do this a lot, in an inner loop, and then I
>add each new point x,y to the list. I know the range of x and y in

Are you trying to find closest location to a mouse cursor as it moves,
and then adding a point when there's a click? I.e., your particular use case
might suggest a strategy that's different from, e.g., what you'd do if
each new point's coordinates where read from file or came from a generator,
and you had exactly one search leading to exactly one update of the set.
And also what you wanted to do with the completed set.

>
>One solution that comes to mind is to partition to space into
>comes in, identify it's quadrant and only search the appropriate
>quadrant for nearest neighbor. This could be done recursively, a 2D
>binary search of sorts....

This might be a way of pruning, but you'd have to take into account that
nearest square doesn't guarantee nearest diagonal distance. Just blathering
off the top of my head, ... I think I would try dividing x and y into maybe a
16*16 grid of squares. A new point will fall into one of those, and then if
you find some existing points in that square, you could brute force find the
closest (remembering that comparing squared radial distances works as well as
comparing their square roots and then see if that shortest distance can
reach into any adjacent squares, and search those too if so, since there could be
a point just the other side of the border, or diagonally across an adjacent
corner that could be closer than your currently determined distance.

You could keep info about points in a square in lists or dicts (16*16 might
be sparsely populated, best in a dict of squares accessed by grid coordinates
(i.e., 4 bits apiece, maybe as tuple or combined as a single number (but then
you could use a list pre-populated with None's instead of a dict, so either way).

I guess in the extreme you could compute a complete table of nearest point
coordinates for every possible x,y point, so you'd have a raster map of voronoi
regions, with each region colored by the coordinates of its nearest point. The more
points you had, the less info would have to be updated for each new point.
I wonder when the crossover would occur
>
>Can anyone point me to some code or module that provides the
>appropriate data structures and algorithms to handle this task
>efficiently? The size of the list will likely be in the range of
>10-1000 elements.
>

What are the ranges of x and y?

Regards,
Bengt Richter

John Hunter
Guest
Posts: n/a

 11-04-2003
>>>>> "Bengt" == Bengt Richter <(E-Mail Removed)> writes:

Bengt> Are you trying to find closest location to a mouse cursor
Bengt> as it moves, and then adding a point when there's a click?
Bengt> I.e., your particular use case might suggest a strategy
Bengt> that's different from, e.g., what you'd do if each new
Bengt> point's coordinates where read from file or came from a
Bengt> exactly one update of the set. And also what you wanted to
Bengt> do with the completed set.

I had two use cases just yesterday. The one that prompted the
question arose in making a contour plot. I'm defining a contour as an
ordered sequence of values over a 2D MxN matrix where the values
differ from some target value by at most some tolerance. I maintain a
list of i,j indices into the matrix for a given contour value, and
follow the contour from a given i,j location by examining its
neighbors. In order to close the loop (eg, when the contour finder
has passed once around a level curve of a mountain, I want to test
whether a given point i,j is close to a previously discovered point
k,l. Since I have a list of these 2 tuple coordinates, I want to find
the nearest neighbor in the list and terminate the contour when the
nearest neighbor falls within some minimum distance

3 4 5
2 6
13 1 7
12 8
11 10 9

In the example above, I am traversing a contour identifying points in
the order 1,2,3...; as above each point represents an i,j tuple which
is an index into the matrix I am contouring. I would like to
terminate the search at 13 rather than spiral around the existing
contour 1-12. Each time I add a new point to the contour, I would like
to query the existing list (excluding the most recently added points
which are close by construction) of locations and find the minimum
distance. If I'm not too close to the preexisting contour, I add the
new point and proceed.

As I write this I realize there is an important efficiency. Since
from an existing point I add the closest neighbor, the biggest step I
can make is 1,1. If on the last nearest neighbor query I find a
minimum distance of d, it will take me d minimum steps to approach the
existing contour. So I don't need to check the distance again for at
least d steps. So the algorithm can proceed 1) obtain the distance d
from the existing contour to the most recently obtained point 2) make
d steps adding points that meet the value criteria 3) repeat.

The second use case arose with gdmodule, which can only allocate 256
colors, which I cache as a dict from rgb tuples (eg, 0.0, 0.05, 1.0)
to color. When the total number of color allocations is made, and a
new rgb request comes into the color manager, I pick the already
allocated point in rgb space closest to the requested point.

I'll try David Eppstein's approach tomorrow and see how this fares.

Thanks to all for suggestions,
John Hunter

Andrew Dalke
Guest
Posts: n/a

 11-04-2003
> for point in p:
> distance = math.sqrt((new_point[0]-point[0])**2 \
> +(new_point[1]-point[1])**2)

> I really don't know how you can make this faster. There might be a
> library that has a distance between two points function that could
> speed it up.

An easy way is to move the math.sqrt call outside the loop, since
sqrt(d1) < sqrt(d2) iff d1 < d2 (when d1,d2>=0)

Andrew
(E-Mail Removed)

G.J.Giezeman
Guest
Posts: n/a

 11-04-2003
Isaac To wrote:

>>>>>>"John" == John Hunter <(E-Mail Removed)> writes:

>
>
> John> Given a new point x,y, I would like to find the point in the list
> John> closest to x,y. I have to do this a lot, in an inner loop, and
> John> then I add each new point x,y to the list. I know the range of x
> John> and y in advance.
>
> John> One solution that comes to mind is to partition to space into
> John> quadrants and store the elements by quadrant. When a new element
> John> comes in, identify it's quadrant and only search the appropriate
> John> quadrant for nearest neighbor. This could be done recursively, a
> John> 2D binary search of sorts....
>
> By recursion your solution would work in O(log n) time. The construction
> would take O(n log n) time. Unluckily, it can return the wrong point, as
> the nearest point within the nearest quadrant might not be the nearest
> point.
>
> The problem is a well-studied basic computational geometry problem, although
> I don't really know any Python code that actually do it. Try to look at the
> web for "Voronoi diagrams" and "radial triangulation" to understand how to
> solve it properly in the above mentioned (perhaps randomized) time
> complexity.
>
> Regards,
> Isaac.

A solution in C++ is using the CGAL-library (www.cgal.org). Look in the
index of the basic library and search for 'nearest'. It will point you
to Delaunay triangulations, which, together with a triangulation
hierarchy, will give O(log n) time complexity, except in pathological
cases. You can call C++ code from python.
B.t.w., there will be a new release of the CGAL library very soon
(probably this week).