On Mon, 02 Feb 2004 17:07:52 -0500, (E-Mail Removed)

(David M. Cooke) wrote:

>At some point, "Batista, Facundo" <(E-Mail Removed)> wrote:

>

>> danb_83 wrote:

>>

>> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply

>> #- that humans height comes in discrete packets of 0.01 m. It

>> #- means that

>> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my

>> #- posture and the time of day, and "1.80" is just a convenient

>> #- approximation. And it wouldn't be inaccurate to express my height as

>> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because

>> #- these are within the tolerance of the measurement. So number base

>> #- doesn't matter here.

>>

>> Are you saying that it's ok to store your number imprecisely because you

>> don't take well measures?

>

>What we need for this is an interval type. 1.80 m shouldn't be stored

>as '1.80', but as '1.80 +/- 0.005', and operations such as addition

>and multiplication should propogate the intervals.
I disagree with this, not because it is a bad idea to keep track of

precision, but because this should not be a part of the float type or

of basic arithmetic operations.

When you write a value with its precision specified in the form of an

interval, that interval is a second number. The value with the

precision is a compound representation, built up using simpler

components. It doesn't mean that the components no longer have uses

outside of the compound. In Python, the same should apply - a numeric

type that can track precision sounds useful, but it shouldn't replace

the existing float.

One good reason is simply that knowledge of the precision is only

sometimes useful. As an obvious example, what would the point be of

keeping track of the precision of the calculations in a 3D game -

there is no point as the information about precision has no bearing on

the rendering of the image.

Besides this, there is a much more fundamental problem.

The whole point of using an imprecise representation is because

manipulating a perfect representation is impractical - mainly slow.

It is true that in general the source is inherently approximate too,

meaning that floats are a quite a good match for the physical

measurements they are often used to represent, but still if it were

practical to do perfect arithmetic on those approximate values it

would give slightly more precise answers as the arithmetic would not

introduce additional sources of error.

Having an approximate representation with an interval sounds good, but

remember that one error source is the arithmetic itself - e.g. 1.0 /

3.0 cannot be finitely represented in either binary or decimal without

error (except as a rational, of course).

So therefore, in answer to your question...

>How to do that is another question: for addition, do you add the

>magnitudes of the intervals, or use the square root of the sums of the

>squares, or something else? It greatly depends on what _type_ of error

>0.005 measures (is it the width of a Gaussian distribution? a uniform

>distribution? something skewed that's not representable by one

>number?).
None of these is sufficient - they may track the errors resulting from

measurement issues (if you choose the appropriate method for your

application) but neither takes into account errors resulting from the

imprecision of the arithmetic. Furthermore, to keep track of such

imprecision precisely means you need an infinitely precise numeric

representation for your interval - and if it was practical to do that,

it would be far better to just use that representation for the value

itself.

This doesn't mean that tracking precision is a bad idea. It just means

that when it is done, the error interval itself should be imprecise.

You should have the guarantee that the real value is never going to be

outside of the given bounds, but not the guarantee that the bounds are

as close together as possible - the bounds should be allowed to get a

little further apart to allow for imprecision in the calculation of

the interval.

And if the error interval is itself an approximation, why track it on

every single arithmetic operation? Unless you have a specific good

reason to do so, it makes much more sense to handle the precision

tracking at a higher level. And as those higher level operations are

often going to be application specific, having a single library for it

(ie not tailored to some particular type of task) is IMO unlikely to

work.

For instance, consider calculating and applying a 3D rotation matrix

to a vector. If you track errors on every float value, that is 9

values in the matrix with error values (due to limited precision trig

functions etc) and 3 values in the vector, a dozen for the

intermediate results in the matrix multiplication, and 3 error

intervals for the 3 dimensions of the output vector. But the odds are

that all you want is a single float value - the maximum distance

between the real point and the point represented by the output vector,

and you can probably get a good value for that by multiplying the

length of the input vector by some 'potential error from rotation'

constant.

Incidentally, it would not always be appropriate to include arithmetic

errors in error intervals. For instance, some statistical interval

types do not guarantee that all values are within the interval range.

They may guarantee that 95% of values are within the interval, for

instance - _and_ that 5% of values are outside the interval. The 5%

outside is as important as the 95% inside, so there is no acceptable

direction to move the bounds a little 'just to be safe'.

In some cases, you might even want to track the error interval (from

arithmetic error) for your error interval value. I can certainly

imagine a result with the form...

The average widginess of a blodgit is 9.5 +/- 0.2

95% differ from the average by less than 2.7 +/- 0.03

Thus I can say that this randomly chosen blodgit has a

widginess of (9.5 +/- 0.2) +/- (2.7 +/- 0.03) with 95% confidence.

You might even get results like that it you had estimated the average

and distribution of widginess from a sample of the blodgits - in which

case, you may still need to account from the arithmetic error which

requires potentially another four values

--

Steve Horne

steve at ninereeds dot fsnet dot co dot uk