Velocity Reviews > Ruby > [QUIZ] The Smallest Circle (#157)

# [QUIZ] The Smallest Circle (#157)

Bill Kelly
Guest
Posts: n/a

 02-17-2008

From: "Lionel Bouton" <(E-Mail Removed)>
> Alex Shulgin wrote:
>>> http://tastyspleen.net/~billk/ruby/q...rcle/157_small...
>>>

>>
>> I'm trying to utilize gnuplot for visualization. With any luck I can
>> post the visual part without spoiling the quiz.
>>

>
> Glad to see that I'm not the sole old-school minded

Hehe... if Wikipedia is to be believed, gnuplot is 22 years old, and
OpenGL is 16 years old. Practically siblings.

Regards,

Bill

Xavier Noria
Guest
Posts: n/a

 02-17-2008
Without looking for literature (on purpose), I thought that an exact
solution could perhaps be computed from the convex hull of the set of
points. I've written a self-made algorithm to compute the convex hull,
but run out of time to continue with the problem.

Anyone?

-- fxn

Lionel Bouton
Guest
Posts: n/a

 02-17-2008
Lionel Bouton wrote:
> Hi,
>
> here's my solution. I initially thought that the best circle would be
> either defined by a diameter (2 points) or one combinations of 3
> points through which it passes. The idea was that given three points
> defining a circle a fourth point is either in the circle or defines a
> circle with 2 from the 3 originals that includes the remaining point.
>
> This would be neat as I wasn't so happy about a gradient walking
> solution. In this case it can be proven that you can deterministically
> local maximums) but it's not the general case. Unfortunately it can be
> proven that such a circle can encircle all points, but not that it's
> the smallest (and my experiments proved that it isn't).

Update : it seems I was wrong and probably had a bug in this
implementation as I just read this algorithm described as the best known
solution until 1983 On the paper referenced by Douglas :

http://www.cs.mcgill.ca/~cs507/projects/1998/jacob/

On a purely random set, my gradient walking is 27 times faster on 10000
points than Meggido's algorithm (but certainly slower in some cases,
Douglas' solution gets the result in a fairly constant time for a given
set size).

Hum, after testing, modifying the initial step for the gradient walking
from

max width of the set / point_number
to
max width of the set / 4

makes it competitive in all cases (12s on 10000 points instead of 270
for Meggido's) and not balanced data sets (which random ones are) only.

Lionel

Lionel Bouton
Guest
Posts: n/a

 02-17-2008
Bill Kelly wrote:
>
> Hehe... if Wikipedia is to be believed, gnuplot is 22 years old, and
> OpenGL is 16 years old. Practically siblings.
>

True, but using OpenGL for rendering data sets was only recently made
easy thanks to Ruby

Lionel

Bill Kelly
Guest
Posts: n/a

 02-17-2008

From: "Frank Fischer" <(E-Mail Removed)-chemnitz.de>
> On 2008-02-17, Justin Ethier <(E-Mail Removed)> wrote:
>>
>> Had a lot of fun with this one My solution works by taking the average of
>> all points as the center, and the farthest outlying point as the radius. An
>> optimal solution is then found by moving the circle center towards the
>> outlier as the radius is reduced, while all points still are within the
>> radius. The same principles could be applied for higher dimensions as well.

>
> Nice solution, this was also one of my first ideas. The problem with
> this approach is if many points are close to each other and only few
> points are far away. In this case the center is near that mass of points
> and the line-search may fail to find an optimal solution. For example:
>
> puts encircle( ([[0.0,0.0]] * 100 + [[0.0,1.0], [1.0,0.0]]).map{|p|
> Point.new(*p)})
>
> (100 points at (0,0), one at (1,0) and one at (0,1)) gives a solution of
> but optimal would be (1/2, 1/2) and radius 1/sqrt(2).

My variation using an axis-aligned bounding box to pick the
initial center point appears to handle the above case.

However I would not be surprised if it still might find some
sub-optimal solutions.... :-/

Regards,

Bill

ThoML
Guest
Posts: n/a

 02-17-2008
> And again I reply to my own post.

This is getting a bad habit. Sorry. This slightly modified version
uses the center of the two points with maximum distance as starting
point. For random sets, the results fit quite well most of the time --
on certain sets it's farther off than the mass center version though.

Regards,
Thomas.

def encircle(points) # takes array of Point objects
points = points.uniq
return if points.nil? or points.empty?

return Circle.new(points[0], 0) if points.size == 1

m, n = points.combination(2).sort_by {|a, b| -distance(a, b)}[0]
a = Point.new(m.x / 2 + n.x / 2, m.y / 2 + n.y / 2)
return Circle.new(a, distance(a, m)) unless points.size > 2

points = points.sort_by {|p| -distance(a, p)}
f = points[0]
df = distance(f, a)
b = med(f, a)
e = 1e-10
1000.times do
db = distance(f, b)
if points.all? {|p| distance(b, p) <= db + e}
da = distance(f, a)
if (da - db).abs <= e
return Circle.new(b, db)
else
a, b = b, med(f, b)
end
else
b = med(a, b)
end
end
raise RuntimeError
end

def med(a, b)
Point.new((b.x - a.x) / 2.0 + a.x, (b.y - a.y) / 2.0 + a.y)
end

def distance(p1, p2)
Math.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2)
end

Alex Shulgin
Guest
Posts: n/a

 02-17-2008
On Feb 15, 3:19 pm, Matthew D Moss <(E-Mail Removed)> wrote:
>
> As mentioned, this one may be more difficult than it seems. Feel free
> to estimate the smallest circle, if you get stuck on getting the
> exact solution.

This is really way more difficult than it seemed to me...

I was approaching some geometrical solution just to be shocked that
there are cases there not both ends of the longest chord lay on the
smallest circle!

Check out these (found by pure luck and help of random):

-0.385628 -0.435444
-0.319182 +0.478400
+0.257012 +0.333913
-0.243603 -0.496925

The longest chord is from points[0] to points[2] (of length 1.002445),
but correct solution appears to be the circle around a triangle [[1],
[2], [3]]. It includes second to longest chord [1]->[3] of length
0.978249.

--
Alex

Lionel Bouton
Guest
Posts: n/a

 02-19-2008
Philipp Hofmann wrote:
> Here is my solution. Although it uses a brute force approach if the
> first attempts (driven by obvious constraints) of determining the
> circle fail, it scales on larger sets, because the given equal
> distibution of random points form an approximate square on these
> sets.

Wow, that was awfully fast for the random distribution, it takes more
time to display the result than computing it even on 10000 points !

On badly shaped point clouds it suffers greatly, I couldn't get it to
compute the circle for 1000 points if I feed it with :

class Point < Struct.new(, :y)
def self.random
ro = rand / 2
theta = Math:I * 2 * rand
Point.new(ro * Math::cos(theta) + 0.5, ro * Math::sin(theta) + 0.5)
end
[...]
end

The ruby process happily took more than 800MB and I had to kill it while
I could still get some feedback from my poor laptop

Note :
I found it suspicious that I could get better performance than the
accepted best mathematical solution in the general case and I reflected
a while on that.
I think the reason why the gradient walking is faster is because it is
inherently inaccurate. Megiddo's algorithm defines the exact position of
the center using a geometrical process. Unfortunately computers must
manipulate floating point data when describing points, segments,
vectors, circles, ... So any computer result is inherently inaccurate.
The gradient walking method profits from this inaccuracy: it stops when
the step becomes too small to move the current best position because of
rounding errors. This happens in o(n) steps where n is the number of
bits used to represent the floating point mantissa which is a huge win
(I don't remember the actual number of bits, but it must be < 64)
compared to the o(n) of Megiddo's where n is the number of *points*.

I benched the hull computing method and I believe that we can get the
best of both worlds : on any distribution of points I threw at it
(random in a square, in a disk or on a circle) computing the convex hull
took 0.4s for 10000 points on my laptop. To be as fast as possible, we
could run your algorithm up to the point where it switches on the brute
force method. Then, instead of looking for the best circle based on
three points of the hull, we'd use the gradient walking method. Using
only the hull points saves time in the most common cases and the
filtering is quick enough to be negligible in the worst case when no
point can be filtered out.

My tests only filtering the points using your ConvexHull class before
applying gradient walking returns in 0.5s for the initial distribution
(on a square) or on a disk and fall back to 12-13s on the worst case:
distribution on a circle.

Lionel.

Lionel Bouton
Guest
Posts: n/a

 02-19-2008
Philipp Hofmann wrote:
>
> What you call a 'badly shaped point clouds' is in fact an approximate
> circle.

In fact initially I tested with a disk (which is an intermediate level
before the circle for the number of points on the convex hull), the
worst shape is indeed a circle (when you replace ro by a constant in my
random method for people still following) : the number of polygone
vertices is then the total number of points.

> I confident it doesn't do any harm to your poor laptop this time.
>

The little thing survived ! Clarkson's is good for the worst case (every
point on the same circle) which fits common sense (any circle should
encircle all points, so the first iteration should win minus rounding
errors in floating point computations), it's nearly 3x as fast as the
gradient walking in this case. The speed doesn't change too much given
various random shape (between 2 to 4s on my laptop). The gradient
walking still wins when a small portion of the points are on the convex
hull (random on a disk for example: 0.5s vs 4s).

I believe gradient walking should be faster when less than one third of
the points are on the convex hull, slower in the other case, but it's a
rough estimation based on observed behavior.

One thing that I don't like though is that the total time is difficult
to predict. Probabilistic algorithms often have good performance but I'm
always uneasy when I'm not sure how much time they can spend before
returning and even when they will exhibit bad behaviour. It's a matter
of constraints in which the code must fit and I mostly have to return
results in a timely manner in most of my projects

Is there any public paper describing the performance behavior of the
algorithm? All scientific papers returned by Google point to
paid-subscription materials.

Lionel

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It's my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don't see much place for elegant code.
Not that I complain: I love to study the performance of algorithms too

Philipp Hofmann
Guest
Posts: n/a

 02-20-2008

On Wed, Feb 20, 2008 at 07:20:06AM +0900, Lionel Bouton wrote:
>
> One thing that I don't like though is that the total time is difficult
> to predict. Probabilistic algorithms often have good performance but I'm
> always uneasy when I'm not sure how much time they can spend before
> returning and even when they will exhibit bad behaviour. It's a matter
> of constraints in which the code must fit and I mostly have to return
> results in a timely manner in most of my projects
>
> Is there any public paper describing the performance behavior of the
> algorithm? All scientific papers returned by Google point to
> paid-subscription materials.

sorry, I haven't come across a public paper of clarkson's algorithm,
either. But it's proven that the algorithm has O(log(n)), n of course
being the number of given points.

I'm not going to do a full proof here, but it's an outline.

You need at max 3 points to describe the smallest enclosing circle. If
you haven't found it yet you increase the chance of every point that is
outside a circle determined by 13 random points each turn, by
doubleing their relative frequencies.

After k iterations at least one of the (at max) three points you are
looking for has a relative frequency of 2**(k/3)

In average the sum of relative frequencies is increased by a factor of
1+3/13 each iteration.

For an increasing k this leads to a paradox: Because at some point the
relative frequency of at least one point 2**(k/3) would exceed the
overall sum of relative frequencies (n*(1+3/13))**k, if the algorithm
hadn't come to an end, yet.

The only reference I found on the net where it is at least partially
explained and where I got the outline for the proof from is this (but
be warned, it's in german):

http://www-i1.informatik.rwth-aachen...mus/algo42.php

> PS: is it me or is this becoming more an algorithm-Quiz than a
> Ruby-Quiz? It's my first Quiz and I was expecting to see some clever use
> of Ruby, but in this kind of Quiz I don't see much place for elegant
> code.
> Not that I complain: I love to study the performance of algorithms too

Even if this one seems to be more about the algorithm, we still do it
in Ruby, right? Anyway we'll see what our brand new quizmaster will focus
on.

g phil