http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> >Arthur wrote:

> ...

> >> *Our* intelligence seems to give us a read as to where on the bell

> >> curve a particular event may lie, or a least some sense of when we are

> >> at an

>

>>Wrong: human beings are *eager* pattern-matching devices, extremely prone

>>to detect "patterns" that just don't exist in statistically significant

>>ways. There's quite a substantial body of literature, by now, on the

>>general issue of frequent fallacies on reasoning about probabilities.

>

> I can accept a "poorly expressed". Not sure I can sign onto a "wrong".
The single line of text following this one is one of the longest I've

ever seen posted to Usenet - my compliments. Not sure why KDE's KNode

showed it to me as a single line (with a left-right scrollbar to let

me eventually view it all) but managed to fold it for reply purposes!-).

> actual occurrrence of an "unlikely" occurrence. That sense that something

> unlikely has occurred is not wrong. And it is hard to put a finger on

> everything that goes into coming to such a conclusion. And therefore, I

> would presume, difficult to program.
Actually, I stick with our line from back in the '80s, when I was doing

speech recognition with IBM Research on a strictly probabilistic basis:

what we had on our T-shirts was

P(A|B) = P(B|A)P(A)/P(B)

and you know, there IS really nothing more to it than this formula from

1764... almost

. And, it IS easy to program, if programmers were in

fact humble enough to study and apply statistics and probability rather

than looking for "artificial intelligence" silver bullets!-)

One thing you do have to estimate heuristically, in order to be able

to apply Bayes' theorem to many cases of practical use, is the probability

at any time (given an existing body of observations) that the next thing

(combination of features) you're going to observe is going to be one

you never observed yet (as opposed to, one among the set you did

observe). Turing formulated a good heuristic for that, and, I'm told,

that heuristic is widely used in biometrics (trying to determine

correlations between e.g. umpteen possible features of a butterfly --

long vs short legs, ditto antennae, coloring, wingshape details, ...).

I think my own heuristic (a bit more prudent/pessimistic than Turing's)

works even better (we did validate that in terms of prediction performance

of recognition systems using either heuristic but otherwise identical, and

I also have handwaving considerations to justify it). Turing's heuristic

boils down to: number of different observations that were made ONCE, divided

by total number of observations. So, if your observations so far have been:

1 2 1 3 5 3 5 4 7 7 ...

having made 10 observations in total, of which two (items 2 and 5) were

observed only once, Turing's heuristic would predict a probability of 0.2

for the next observation being "a surprise" (one never seen before, i.e.

one not in the set {1,2,3,4,5,7}). My heuristic boils down to: number

of _different_ observations, divided by total number of observations; so,

my heuristic would predict a probability of 0.6 for the next observation

being "a surprise" (6 different things observed in 10 observations). The

difference in prediction is never as high in practical use cases (with

MANY observations having been made -- ten isn't "many"

, and of course

there's all sort of implicit hypotheses (e.g. practically-infinite

alphabet of possible observations compared to the number of actual

observations -- when each observation is a combination of many separate

features, combinatorial explosion basically guarantees that

. One

could no doubt adjust either heuristic to work better for cases where

these hypotheses may fail (e.g. after observing "4 3 1 2" both heristics

predict a suprise probability of 1.0, as if repetition of any of the

4 observations already made, once each, was impossible -- that is clearly

over-predicting surprises!-).

Alex